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NOVEL HCV NON-STRUCTURAL POLYPEPTIDE 



CROSS-REFERENCE TO RELATED APPLICATION 

This application is related to provisional patent application serial no. 60/167,502, 
filed November 24, 1999 from which priority is claimed under 35 USC §1 19(e)(1) and 
1 0 which is incorporated herein by reference in its entirety. 

FIELD OF THE INVENTION 

The present invention relates to polypeptides comprising a mutant non-structural 
Hepatitis C virus ("HCV") polypeptide useful for immunogenic compounds for use against 
15 HCV, methods of preparing and using the same, and immunogenic compositions 

comprising the same. The present invention also relates to compositions comprising (a) a 
mutant non-structural HCV polypeptide and (b) a viral polypeptide that is not a non- 
structural HCV polypeptide and methods of using these compositions. 

20 BACKGROUND OF THE INVENTION 

HCV is now recognized as the major agent of chronic hepatitis and liver disease 
worldwide. It is estimated that HCV infects about 400 million people worldwide, 
corresponding to more than 3% of the world population. 

Hepatitis C virus ("HCV") is a small enveloped RNA flavivirus, which contains a 
25 positive-stranded RNA genome of about 10 kilobases. The genome has a single 

uninterrupted ORF that encodes a protein of 3010-3011 amino acids. The structural 
proteins of HCV include a core protein (C), which is highly immunogenic, as well as two 
envelope proteins (El and E2), which likely form a heterodimer in vivo, and non-structural 
proteins NS2-NS5. It is known that the NS3 region of the virus is important for post- 
30 translational processing of the polyprotein into individual proteins, and the NS5 region 
encodes an RNA-dependant RNA polymerase. 
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Virus-specific T lymphocytes, along with neutralizing antibodies, are the mainstay 
of the antiviral immune defense in established viral infections. Whereas CD8 + cytotoxic T 
cells eliminate virus-infected-cells, CD4 + T helper cells are essential for the efficient 
regulation of the antiviral immune response. CD4 + T helper cells recognize specific 
5 antigens as peptides bound to autologous HLA class II molecules (viral antigens or 
particles are taken up by professional antigen-presenting cells, processed to peptides, 
bound to HLA class II molecules in the lysosomal compartment, and transported back to 
the cell surface). Several observations support an important role of CD4 + T cells in the 
elimination of HCV infection. Tsai et al, 1997 Hepatology 25:449-458; Diepolder et al 

10 1995 Lancet 346: 1—6-1009; Missale et al 1996 JCI 98: 706-714; Botarelli et al 1993; 
Gastro 104: 580-587; Diepolder et al 1997 J.Virol 71: 6011. Immunogenic peptides 
usually have a minimal length of 8-1 1 amino acids. However, since the peptide binding 
groove of HLA class II molecules seems to be open at both ends, longer peptides are 
tolerated. Thus peptides eluted from HLA class II molecules are typically in the range of 

15 15-25 amino acids. HLA class II molecules are extremely polymorphic and each allele 
seems to have its individual requirements for peptide binding. Thus the HLA class II 
repertoire of a given individual determines which viral peptides can be presented to T cells. 
Recognition of the specific HLA-peptide complex by the T cell receptor accompanied by 
appropriate costimulatory signals lead to T cell activation, secretion of cytokines, and T 

20 cell proliferation. 

Numerous studies demonstrate that HLA Class II restricted CD4 + responses are 
determined by stimulating peripheral blood mononuclear cells with recombinant viral 
antigens or peptides. Botarelli et al, (1993) Gastroenterology 104:580-587; Farrari et al, 
(1994) Hepatology 19:286-295; Minutello etal, (1993) C. J. Exp. Med. 178:17-25; 

25 Hoffmann et al, (1995) Hepatology 21 :632-638; Iwata et al, (1995) Hepatology 22:1057- 
1064; and Tsai.e* al, (1995) Hepatology 21:908-912. 

Polyclonal multispecific CD8 + T cell responses have been detected in patients with 
chronic hepatitis C. Additionally, CD8 + CTL's were shown to be important in resolving 
acute HCV infection in chimpanzees (Cooper et al, Immunity 1999). About 50% of 

30 patients with chronic hepatitis C demonstrate a detectable virus-specific CD4 + T cell 
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response, which is most frequently directed against HCV core and/or NS4 and tends to be 
more common in patients who achieve sustained viral clearance during interferon-a 
therapy. 

Depending on the pattern of lymphokines, CD4 + T helper cells have been classified 
5 as TH1, THO, or TH2. Cytokines of the TH1 type are typically IFN-y, lymphotoxin, and 
interleukin-2 (IL-2), which are believed to support activation of virus-specific CD8 + T cells 
and natural killer cells. The TH2 cytokines IL-4, IL-5, IL-10, and IL-13 are important for 
B cell activation and differentiation, thus inducing a humoral immune response. 

During acute hepatitis C infection a strong and sustained TH1/TH0 response to 

10 NS3 and possibly to other nonstructural proteins is associated with a self-limited course of 
the disease. Diapolder et al 9 (1995) Lancet 346:1006-1007, showed all CD4 + T cell clones 
to have a TH1 or THO cytokine profile, suggesting that the clones support cytotoxic 
immune mechanisms in vivo. The majority of CD4 + T cell clones responded to a relatively 
short segment of NS3, namely amino acids 1207-1278, suggesting that this region of NS3 

1 5 is immunodominant for CD4 + T cells. More than 70% of those who contract HCV 
develop chronic infection and hepatitis, and a significant portion of them progress to 
cirrhosis and eventually hepatocellular carcinoma. The only approved therapy at present is 
a 6- to 12- month course of interferon a, which leads to sustained improvement in only 
20% of patients. So far, no commercial vaccine is available. 

20 Thus, there remains a need for compositions and methods capable of promoting 

anti-HCV responses. 

SUMMARY OF THE INVENTION 

In one aspect, the present invention relates to isolated polypeptides comprising 
25 mutant hepatitis C ("HCV") polypeptides comprising at least portions of NS3, NS4, and 
NS5. In a preferred aspect, NS3 is encoded by a nucleic acid sequence having an N- 
terminal deletion to remove the catalytic domain. The NS mutant polypeptides can include 
NS3, NS4s, NS4b, NS5a, NS5b or portions thereof. For example, in various embodiments, 
the mutant NS polypeptide comprises NS3, NS4 (NS4a and NS4b) and NS5 (NS5a and 
30 NS5b). In other embodiments, the NS polypeptide consists of NS3 and NS4 (for example, 
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NS4a and/or NS4b) or NS3 and NS5 (for example, NS5a and/or NS5b). Other 
combinations of full-length or fragments of non-structural components are also 
contemplated. 

In another preferred aspect, the polypeptides further comprise a viral polypeptide 
5 that is not a non-structural HCV polypeptide. Such polypeptides are preferably C, or 

antigenic fragments thereof, more preferably, truncated C of HCV. Other polypeptides are 
preferably E, or antigenic fragments thereof, more preferably, El or E2 of HCV. Such 
polypeptides need not be encoded by a natural HCV genome, and include, for example, 
truncated or otherwise mutant HCV polypeptides or polypeptides derived from other 

10 genomes, such as, for example, polypeptides of HBV. Thus, the invention includes an 

isolated mutant non-structural ("NS") HCV polypeptide comprising a polypeptide having a 
mutation in the catalytic domain of NS3 that functionally disrupts the catalytic domain. 
The mutation can be, for example, a deletion or a substitution mutation. In certain 
embodiments, the mutant NS polypeptide comprises NS3, NS4 and NS5. In other 

1 5 embodiments, the mutant NS polypeptides described herein further comprise a second viral 
polypeptide that is not NS3, NS4, or NS5 of HCV, for example an HCV Core polypeptide 
("C"), or fragment thereof, or an HCV envelope protein ("E"), for example El and/or E2. 
In certain embodiments, C is truncated (e.g., at amino acid 121). 

In another aspect, the present invention relates to compositions comprising any of 

20 the mutant hepatitis C ("HCV") polypeptides described herein, for example polypeptides 
comprising at least portions of NS3, NS4, and NS5. In a preferred aspect, NS3 is encoded 
by a nucleic acid sequence having an N-terminal deletion to disrupt the function of the 
catalytic domain, for example by removing this domain. In another preferred aspect, the 
polypeptides further comprise a viral polypeptide that is not a non-structural HCV 

25 polypeptide. Such polypeptides are preferably C, or antigenic fragments thereof, more 
preferably, truncated C of HCV. Other polypeptides are preferably E, or antigenic 
fragments thereof, more preferably, El or E2 of HCV Such polypeptides need not be 
encoded by a natural HCV genome, and include, for example, truncated or otherwise 
mutant HCV polypeptides or polypeptides derived from other genomes, such as, for 

30 example, polypeptides of HBV. In another aspect, the invention includes a composition 
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comprising (a) any of the polypeptides described herein; and (b) a pharmaceutical^ 
acceptable excipient (e.g., carrier and/or adjuvant). 

In another aspect, the invention includes an isolated and purified polynucleotide 
which encodes any of the mutant HCV polypeptides described herein. In certain 
5 embodiments, the invention includes a composition comprising (a) the isolated purified 
polynucleotide encoding any of the mutant HCV polypeptides; and (b) a pharmaceutical^ 
acceptable excipient. The polynucleotide, can be for example, DNA in a plasmid, or is in a 
plasmid. Additionally, the polynucleotides described herein may be included in an 
expression vector as shown in the attached Figures and Sequence Listings. 

1 0 In another aspect, the present invention relates to host cells transformed with 

expression vectors comprising a nucleic acid sequence encoding a mutant HCV 
polypeptide comprising at least portions of NS3, NS4, and NS5. In a preferred aspect, the 
expression vectors of the host cells further comprises at least one nucleic acid sequence 
encoding a viral polypeptide that is not a non-structural HCV polypeptide. Such 

1 5 polypeptides are preferably C, or antigenic fragments thereof, more preferably, truncated C 
of HCV, Other polypeptides are preferably E, or antigenic fragments thereof, more 
preferably, El or E2 of HCV. Such polypeptides need not be encoded by a natural HCV 
genome, and include, for example, truncated or otherwise mutant HCV polypeptides or 
polypeptides derived from other genomes, such as, for example, polypeptides of HBV. In 

20 another preferred aspect the nucleic acid sequences of the expression vectors are 

coexpressed. In yet another preferred aspect, the host cells are yeast cells or mammalian 
cells. 

In another aspect, the present invention relates to expression vectors comprising a 
nucleic acid sequence encoding a mutant HCV polypeptide comprising NS3, NS4, and 

25 NS5. In a preferred aspect, the expression vectors of the host cells further comprises at 
least one nucleic acid sequence encoding a viral polypeptide that is not a non-structural 
HCV polypeptide. Such polypeptides are preferably C, or antigenic fragments thereof, 
more preferably, truncated C of HCV. Other polypeptides are preferably E, or antigenic 
fragments thereof, more preferably, El or E2 of HCV. Importantly, such polypeptides 

30 need not be encoded by a natural HCV genome, such as, for example, truncated or 
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otherwise mutant HCV polypeptides or polypeptides derived from other genomes, such as, 
for example, polypeptides of HBV. In another aspect, the present invention relates to 
methods of preparing a mutant HCV polypeptides. In a preferred aspect, the method 
comprises the steps of transforming a host cell with an expression vector, said vector 
5 comprising a nucleic acid sequence encoding a mutant HCV polypeptide comprising at 
least portions of NS3, NS4, and NS5, and isolating said polypeptide. In another preferred 
aspect the HCV polypeptide further comprises a viral polypeptide that is not a non- 
structural HCV polypeptide. Such polypeptides are preferably C, or antigenic fragments 
thereof, more preferably, truncated C of HCV. Other polypeptides are preferably E, or 
1 0 antigenic fragments thereof, more preferably, El or E2 of HCV. Such polypeptides need 
not be encoded by a natural HCV genome, and include, for example, truncated or 
otherwise mutant HCV polypeptides or polypeptides derived from other genomes, such as, 
for example, polypeptides of HBV. In another preferred aspect the host cells are yeast cells 
or mammalian cells. 

15 In another aspect, the present invention relates to antibodies which specifically bind 

to mutant HCV polypeptide comprising NS3, NS4, and NS5, and to methods of making 
and using the same. In a preferred aspect, the HCV polypeptide further comprises a viral 
polypeptide that is not a non-structural HCV polypeptide. Such polypeptides are 
preferably C, or antigenic fragments thereof, more preferably, truncated C of HCV. Other 

20 polypeptides are preferably E, or antigenic fragments thereof, more preferably, El or E2 of 
HCV. Such polypeptides need not be encoded by a natural HCV genome, such as, for 
example, truncated or otherwise mutant HCV polypeptides or polypeptides derived from 
other genomes, and include, for example, polypeptides of HBV. In another preferred 
aspect, the antibody is either monoclonal or polyclonal. 

25 In yet another aspect, a method of preparing a mutant NS HCV polypeptide, 

wherein the method comprises the steps of (a) transforming a host cell with any of the 
expression vectors described herein, under conditions wherein the polypeptide is 
expressed; and (b) isolating the polypeptide. The host cell can be, for example, a yeast 
cell, a mammalian cell a plant cell or an insect cell. The polypeptide can be expressed and 

30 isolated intracellular^ or can be secreted and isolated from the surrounding environment. 
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In a still further aspect, a method of eliciting an immune response in a subject is 
provided. The immune response can be elicited by administering any of the 
polynucleotides and/or polypeptides described herein in one or multiple doses. 

These and other embodiments of the subject invention will readily occur to those of 
5 skill in the art in light of the disclosure herein. 



BRIEF DESCRIPTION OF THE FIGURES 

FIG. 1 shows the cloning scheme for generating pCMV-NS35. 
FIG. 2 shows the 9621bp vector pCMV-NS35. 
1 0 FIG. 3 shows the nucleic acid sequence of pCMV-NS35 (SEQ ID NO: 1), including the 
nucleic acid sequence of the NS35 ORF, and also the translation of NS35 (SEQ ID NO:2). 
FIG. 4 shows the 9621bp pCMV-delNS35. 

FIG. 5 shows the nucleic acid sequence of pCMV-delNS35 (SEQ ID NO:3), including the 
nucleic acid sequence of the delNS35 ORF, and also the translation of the delNS35 
15 polypeptide (SEQ ID NO:4). 

FIG. 6 shows the 4276bp pCMV-II. 

FIG. 7 shows the nucleic acid sequence of pCMV-II (SEQ ID NO:5). 
FIG. 8 shows the 6300bp pCMV-NS34A. 

FIG. 9 shows the nucleic acid sequence of pCMV-NS34A (SEQ ID NO:6), including the 
20 nucleic acid sequence of the NS34A ORF, and also the translation of NS34A (SEQ ID 
NO:7). 

FIG. 10 shows the cloning scheme for generating pd.ANS3NS5. 

FIG. 1 1 shows the nucleic and amino acid sequences of pd.ANS3NS5 (SEQ ID NO: 8 and 
9). 

25 FIG. 12 shows the Western blot of proteins expressed by S. cerevisiae strain AD3 
transformed with pd.ANS3NS5. 

FIG. 13 shows the cloning scheme for generating pd.ANS3NS5.pj. 

FIG. 14 shows the nucleic and amino acid sequences of pd.ANS3NS5.pj (SEQ ID NO: 10 

and 11). 
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FIG. 15 shows the Western blot of proteins expressed by S. cerevisiae strain AD3 
transformed with pd.ANS3NS5.pj, specifically demonstrating the expression of ANS3NS5 
polypeptide. 

FIG. 16 shows the cloning scheme for generating pdANS3NS5.pj.corel21RT and 
5 pdANS3NS5.pj.corel73RT. 

FIG. 17 shows the nucleic and amino acid sequences of pd.ANS3NS5.pj.corel21 (SEQ ID 
NO:12andl3). 

FIG. 18 shows the nucleic and amino acid sequences of pd.ANS3NS5.pj.corel73 (SEQ ID 
NO:14andl5). 

10 FIG. 19 shows the Western blot of proteins expressed by S. cerevisiae strain AD3 
transformed with pd.ANS3NS5/pj ? specifically demonstrating the expression of 
ANS3NS5.corel21 and ANS3NS5.corel73 polypeptides. Lanes 1 and 7 show See Blue 
Standards. Lane 2 shows control yeast plasmid. Lanes 3 and 4 show 
ANS3NS5.corel21RT polypeptide, colonies 1 and 2. Lanes 5 and 6 show 

15 ANS3NS5.corel73RT polypeptide, colonies 3 and 4. 

FIG. 20 shows the cloning scheme for generating pdANS3NS5.pj.corel40RT and 
pdANS3NS5.pj .corel 50RT. 

FIG. 21 shows the nucleic and amino acid sequences of pd.ANS3NS5.pj. corel 40 (SEQ ID 
NO:16andl7). 

20 FIG. 22 shows the nucleic and amino acid sequences of pd.ANS3NS5.pj.corel50 (SEQ ID 
NO:18andl9). 

FIG. 23 shows the Western blot of proteins expressed by S. cerevisiae strain AD3 
transformed with pd.ANS3NS5.pj, specifically demonstrating the expression of 
ANS3NS5corel40 and ANS3NS5corel50 polypeptides. Lane 1 shows See Blue 
25 Standards. Lanes 2 and 3 show ANS3NS55corel40RT polypeptide, colonies 5 and 6. 
Lanes 4 and 5 show ANS3NS5corel50RT polypeptide, colonies 7 and 8. Lane 6 shows 
control yeast plasmid. Lane 7 shows ANS3NS5corel21RT polypeptide, colony 1. Lane 8 
shows ANS3NS5corel73RT polypeptide, colony 5. 
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DETAILED DESCRIPTION OF THE INVENTION 

The practice of the present invention will employ, unless otherwise indicated, 
conventional techniques of molecular biology, microbiology, recombinant DNA 
techniques, and immunology, which are within the skill of the art. Such techniques are 
5 explained fully in the literature. See e.g., Sambrook, et al., MOLECULAR CLONING; A 
LABORATORY MANUAL (1989); DNA CLONING, VOLUMES I AND E (D. N. 
Glover ed. 1985); OLIGONUCLEOTIDE SYNTHESIS (M. J. Gait ed., 1984); NUCLEIC 
ACID HYBRIDIZATION (B. D. Hames & S. J. Higgins eds. 1984); TRANSCRIPTION 
AND TRANSLATION (B. D. Hames & S. J. Higgins eds. 1984); ANIMAL CELL 

1 0 CULTURE (R. I. Freshney ed. 1 986); IMMOBILIZED CELLS AND ENZYMES (IRL 
Press, 1986); B. Perbal, A PRACTICAL GUIDE TO MOLECULAR CLONING (1984); 
the series, METHODS OF ENZYMOLOGY (Academic Press, Inc.); GENE TRANSFER 
VECTORS FOR MAMMALIAN CELLS (J. H. Miller and M. P. Calos eds. 1987, Cold 
Springs Harbor Laboratory), Methods in Enzymology Vol. 154 and Vol. 155 (Wu and 

15 Grossman, and Wu, eds., respectively); Mayer and Walker eds. (1987), 

IMMUNOHISTOCHEMICAL METHODS IN CELL AND MOLECULAR BIOLOGY 
(Academic Press, London); Scopes, (1987), PROTEIN PURIFICATION: PRINCIPALS 
AND PRACTICE, Second Edition (Springer-Verlag, New York); and HANDBOOK OF 
EXPERIMENTAL IMMUNOLOGY, VOLUMES I-IV (D. M. Weir and C. C. Blackwell 

20 eds. 1986). 

All publications, patents and patent applications cited herein, whether supra or 
infra, are hereby incorporated by reference in their entirety. 

It must be noted that, as used in this specification and the appended claims, the 
singular forms "a", "an" and "the" include plural referents unless the content clearly 
25 dictates otherwise. Thus, for example, reference to "an antigen" includes a mixture of two 
or more antigens, and the like. 
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I Definitions 

In describing the present invention, the following terms will be employed, and are 
intended to be defined as indicated below. 

The term "hepatitis C virus" (HCV) refers to an agent causative of Non-A, Non-B 
5 Hepatitis (NANBH). The nucleic acid sequence and putative amino acid sequence of HCV 
is described in U.S. Patent Nos. 5,856,437 and 5,350,671. The disease caused by HCV is 
called hepatitis C, formerly called NANBH. The term HCV, as used herein, denotes a viral 
species of which pathenogenic strains cause NANBH, as well as attenuated strains or 
defective interfering particles derived therefrom. 

10 HCV is a member of the viral family flaviviridae. The morphology and 

composition of Flavi virus particles are known, and are discussed in Reed et al, Curr. Stud. 
Hematol Blood Transfus. (1998), 62:1-37; HEPATITIS C VIRUSES IN FIELDS 
VIROLOGY (B.N. Fields, D.M. Knipe, P.M. Howley, eds.) (3d ed. 1996). It has recently 
been found that portions of the HCV genome are also homologous to pestiviruses. 

1 5 Generally, with respect to morphology, Flaviviruses contain a central nucleocapsid 

surrounded by a lipid bilayer. Virions are spherical and have a diameter of about 40-50 
nm. Their cores are about 25-30 nm in diameter. Along the outer surface of the virion 
envelope are projections that are about 5-10 nm long with terminal knobs about 2 nm in 
diameter. 

20 The HCV genome is comprised of RNA. It is known that RNA containing viruses 

have relatively high rates of spontaneous mutation. Therefore, there can be multiple 
strains, which can be virulent or avirulent, within the HCV class or species. The ORF of 
HCV, including the translation spans of the core, non-structural, and envelope proteins, is 
shown in U.S. Patent Nos. 5,856,437 and 5,350,671. 

25 The terms "polypeptide" and "protein" refer to a polymer of amino acid residues 

and are not limited to a minimum length of the product. Thus, peptides, oligopeptides, 
dimers, multimers, and the like, are included within the definition. Both full-length 
proteins and fragments thereof are encompassed by the definition. The terms also include 
postexpression modifications of the polypeptide, for example, glycosylation, acetylation, 

30 phosphorylation and the like. Furthermore, for purposes of the present invention, a 
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"polypeptide" refers to a protein which includes modifications, such as deletions, additions 
and substitutions (generally conservative in nature), to the native sequence, so long as the 
protein maintains the desired activity. These modifications may be deliberate, as through 
site-directed mutagenesis, or may be accidental, such as through mutations of hosts which 

5 produce the proteins or errors due to PCR amplification. 

An HCV polypeptide is a polypeptide, as defined above, derived from the HCV 
polyprotein. The polypeptide need not be physically derived from HCV, but may be 
synthetically or recombinantly produced. Moreover, the polypeptide may be derived from 
any of the various HCV strains, such as from strains 1, 2, 3 or 4 of HCV. A number of 

10 conserved and variable regions are known between these strains and, in general, the amino 
acid sequences of epitopes derived from these regions will have a high degree of sequence 
homology, e.g., amino acid sequence homology of more than 30%, preferably more than 
40%, when the two sequences are aligned and homology determined by any of the 
programs or algorithms described herein. Thus, for example, the term CC NS4" polypeptide 

15 refers to native NS4 from any of the various HCV strains, as well as NS4 analogs, muteins 
and immunogenic fragments, as defined further below. 

Further, the terms "ANS35," "delNS35," "ANS3NS5," and "ANS3-5" as used 
herein refer to a mutant polypeptide, comprising at least portions of NS3, NS4, or NS5, 
comprising a deletion in, or mutation of, the NS3 protease active site region to render the 

20 protease non- functional. In one embodiment, ANS3-5 comprises amino acids 1242-3011, 
as shown in FIG. 5, or polypeptides substantially homologous thereto. It will be readily 
apparent to one of ordinary skill in the art how to determine that NS3 protease has been 
rendered non-functional. If the protease is functional, one will obtain protein of the 
expected molecular weight upon expression. As set forth in Example 2 and Figure 15, 

25 using SDS-page, 4-20%, a protein having a molecular weight of approximately 194kD was 
obtained when strain AD3 was transformed with pd.ANS3NS5.PJ clone #5. One skilled in 
the art could readily determine whether a protein of the desired molecular weight was 
expressed for any given deletion or mutation. 

The terms "analog" and "mutein" refer to biologically active derivatives of the 

30 reference molecule, or fragments of such derivatives, that retain desired activity, such as 
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the ability to stimulate a cell-mediated immune response, as defined below. In general, the 
term "analog" refers to compounds having a native polypeptide sequence and structure 
with one or more amino acid additions, substitutions (generally conservative in nature) 
and/or deletions, relative to the native molecule, so long as the modifications do not 
5 destroy immunogenic activity. The term "mutein" refers to peptides having one or more 
peptide mimics ("peptoids"), such as those described in International Publication No. WO 
91/04282. Preferably, the analog or mutein has at least the same immunoactivity as the 
native molecule. Methods for making polypeptide analogs and muteins are known in the 
art and are described further below. 

10 Particularly preferred analogs include substitutions that are conservative in nature, 

i.e., those substitutions that take place within a family of amino acids that are related in 
their side chains. Specifically, amino acids are generally divided into four families: (1) 
acidic — aspartate and glutamate; (2) basic — lysine, arginine, histidine; (3) non-polar — 
alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan; and (4) 

15 uncharged polar - glycine, asparagine, glutamine, cysteine, serine threonine, tyrosine. 

Phenylalanine, tryptophan, and tyrosine are sometimes classified as aromatic amino acids. 
For example, it is reasonably predictable that an isolated replacement of leucine with 
isoleucine or valine, an aspartate with a glutamate, a threonine with a serine, or a similar 
conservative replacement of an amino acid with a structurally related amino acid, will not 

20 have a major effect on the biological activity. For example, the polypeptide of interest may 
include up to about 5-10 conservative or non-conservative amino acid substitutions, or 
even up to about 15-25 conservative or non-conservative amino acid substitutions, or any 
integer between 5-25, so long as the desired function of the molecule remains intact. One 
of skill in the art may readily determine regions of the molecule of interest that can tolerate 

25 change by reference to Hopp/Woods and Kyte-Doolittle plots, well known in the art. 

By "fragment" is intended a polypeptide consisting of only a part of the intact full- 
length polypeptide sequence and structure. The fragment can include a C-terminal deletion 
and/or an N-terminal deletion of the native polypeptide. An "immunogenic fragment" of a 
particular HCV protein will generally include at least about 5-10 contiguous amino acid 

30 residues of the full-length molecule, preferably at least about 15-25 contiguous amino acid 
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residues of the full-length molecule, and most preferably at least about 20-50 or more 
contiguous amino acid residues of the full-length molecule, that define an epitope, or any 
integer between 5 amino acids and the full-length sequence, provided that the fragment in 
question retains immunogenic activity, as measured by the assays described herein. For a 
5 description of various HCV epitopes, see, e.g., Chien et al., Proc. Natl Acad. Set USA 
(1992) 89:10011-10015; Chien et al, J. Gastroent. Hepatol (1993) 8:S33-39; Chien et al., 
International Publication No. WO 93/00365; Chien, D.Y., International Publication No. 
WO 94/01778; commonly owned, allowed U.S. Patent Application Serial Nos. 08/403,590 
and 08/444,818. 

10 The term "epitope" as used herein refers to a sequence of at least about 3 to 5, 

preferably about 5 to 10 or 15, and not more than about 1,000 amino acids (or any integer 
therebetween), which define a sequence that by itself or as part of a larger sequence, binds 
to an antibody generated in response to such sequence. There is no critical upper limit to 
the length of the fragment, which may comprise nearly the full-length of the protein 

1 5 sequence, or even a fusion protein comprising two or more epitopes from the HCV 

polyprotein. An epitope for use in the subject invention is not limited to a polypeptide 
having the exact sequence of the portion of the parent protein from which it is derived. 
Indeed, viral genomes are in a state of constant flux and contain several variable domains 
which exhibit relatively high degrees of variability between isolates. Thus the term 

20 "epitope" encompasses sequences identical to the native sequence, as well as modifications 
to the native sequence, such as deletions, additions and substitutions (generally 
conservative in nature). 

Regions of a given polypeptide that include an epitope can be identified using any 
number of epitope mapping techniques, well known in the art. See, e.g., Epitope Mapping 

25 Protocols in Methods in Molecular Biology, Vol. 66 (Glenn E. Morris, Ed., 1996) Humana 
Press, Totowa, New Jersey. For example, linear epitopes may be determined by e.g., 
concurrently synthesizing large numbers of peptides on solid supports, the peptides 
corresponding to portions of the protein molecule, and reacting the peptides with 
antibodies while the peptides are still attached to the supports. Such techniques are known 

30 in the art and described in, e.g., U.S. Patent No. 4,708,871 ; Geysen et al. (1984) Proc. 
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Natl Acad. Set USA 81:3998-4002; Geysen et al. (1986) Molec. Immunol 23:709-715, 
all incorporated herein by reference in their entireties. Similarly, conformational epitopes 
are readily identified by determining spatial conformation of amino acids such as by, e.g., 
x-ray crystallography and 2-dimensional nuclear magnetic resonance. See, e.g., Epitope 
5 Mapping Protocols, supra. Antigenic regions of proteins can also be identified using 

standard antigenicity and hydropathy plots, such as those calculated using, e.g., the Omiga 
version 1.0 software program available from the Oxford Molecular Group. This computer 
program employs the Hopp/Woods method, Hopp et al, Proc. Natl Acad. Sci USA (1981) 
78:3824-3828 for determining antigenicity profiles, and the Kyte-Doolittle technique, Kyte 

10 et al., J. Mol Biol (1982) 157:105-132 for hydropathy plots. 

As used herein, the term "conformational epitope" refers to a portion of a full- 
length protein, or an analog or mutein thereof, having structural features native to the 
amino acid sequence encoding the epitope within the full-length natural protein. Native 
structural features include, but are not limited to, glycosylation and three dimensional 

15 structure. Preferably, a conformational epitope is produced recombinantly and is expressed 
in a cell from which it is extractable under conditions which preserve its desired structural 
features, e.g. without denaturation of the epitope. Such cells include bacteria, yeast, insect, 
and mammalian cells. Expression and isolation of recombinant conformational epitopes 
from the HCV polyprotein are described in e.g., International Publication Nos. WO 

20 96/04301, WO 94/01778, WO 95/33053, WO 92/08734, which applications are herein 
incorporated by reference in their entirety. 

An "immunological response" to an HCV antigen (including both polypeptide and 
polynucleotides encoding polypeptides that are expressed in vivo) or composition is the 
development in a subject of a humoral and/or a cellular immune response to molecules 

25 present in the composition of interest. For purposes of the present invention, a "humoral 
immune response" refers to an immune response mediated by antibody molecules, while a 
"cellular immune response" is one mediated by T-lymphocytes and/or other white blood 
cells. One important aspect of cellular immunity involves an antigen-specific response by 
cytolytic T-cells ("CTLs"). CTLs have specificity for peptide antigens that are presented 

30 in association with proteins encoded by the major histocompatibility complex (MHC) and 



-14- 



PP01617.002 
PATENT 



expressed on the surfaces of cells. CTLs help induce and promote the intracellular 
destruction of intracellular microbes, or the lysis of cells infected with such microbes. 
Another aspect of cellular immunity involves an antigen-specific response by helper T- 
cells. Helper T-cells act to help stimulate the function, and focus the activity of, 

5 nonspecific effector cells against cells displaying peptide antigens in association with 
MHC molecules on their surface. A "cellular immune response" also refers to the 
production of cytokines, chemokines and other such molecules produced by activated T- 
cells and/or other white blood cells, including those derived from CD4+ and CD8+ T-cells. 
A composition or vaccine that elicits a cellular immune response may serve to 

10 sensitize a vertebrate subject by the presentation of antigen in association with MHC 

molecules at the cell surface. The cell-mediated immune response is directed at, or near, 
cells presenting antigen at their surface. In addition, antigen-specific T-lymphocytes can 
be generated to allow for the future protection of an immunized host. 

The ability of a particular antigen to stimulate a cell-mediated immunological 

1 5 response may be determined by a number of assays, such as by lymphoproliferation 
(lymphocyte activation) assays, CTL cytotoxic cell assays, or by assaying for T- 
lymphocytes specific for the antigen in a sensitized subject. Such assays are well known in 
the art. See, e.g., Erickson et al., J. Immunol (1993) 151:4189-4199; Doe et al., Eur. J. 
Immunol (1994) 24:2369-2376; and the examples below. 

20 Thus, an immunological response as used herein may be one which stimulates the 

production of CTLs, and/or the production or activation of helper T- cells. The antigen of 
interest may also elicit an antibody-mediated immune response. Hence, an immunological 
response may include one or more of the following effects: the production of antibodies by 
B-cells; and/or the activation of suppressor T-cells and/or y6 T-cells directed specifically 

25 to an antigen or antigens present in the composition or vaccine of interest. These responses 
may serve to neutralize infectivity, and/or mediate antibody-complement, or antibody 
dependent cell cytotoxicity (ADCC) to provide protection or alleviation of symptoms to an 
immunized host. Such responses can be determined using standard immunoassays and 
neutralization assays, well known in the art. 
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A "coding sequence" or a sequence which "encodes" a selected polypeptide, is a 
nucleic acid molecule which is transcribed (in the case of DNA) and translated (in the case 
of mRNA) into a polypeptide in vitro or in vivo when placed under the control of 
appropriate regulatory sequences. The boundaries of the coding sequence are determined 

5 by a start codon at the 5' (amino) terminus and a translation stop codon at the 3 1 (carboxy) 
terminus. A transcription termination sequence may be located 3' to the coding sequence. 

A "nucleic acid" molecule or "polynucleotide" can include both double- and single- 
stranded sequences and refers to, but is not limited to, cDNA from viral, procaryotic or 
eucaryotic mRNA, genomic DNA sequences from viral (e.g. DNA viruses and 

10 retroviruses) or procaryotic DNA, and especially synthetic DNA sequences. The term also 
captures sequences that include any of the known base analogs of DNA and RNA. 

"Operably linked" refers to an arrangement of elements wherein the components so 
described are configured so as to perform their desired function. Thus, a given promoter 
operably linked to a coding sequence is capable of effecting the expression of the coding 

1 5 sequence when the proper transcription factors, etc., are present. The promoter need not be 
contiguous with the coding sequence, so long as it functions to direct the expression 
thereof. Thus, for example, intervening untranslated yet transcribed sequences can be 
present between the promoter sequence and the coding sequence, as can transcribed 
introns, and the promoter sequence can still be considered "operably linked" to the coding 

20 sequence. 

"Recombinant" as used herein to describe a nucleic acid molecule means a 
polynucleotide of genomic, cDNA, viral, semisynthetic, or synthetic origin which, by 
virtue of its origin or manipulation is not associated with all or a portion of the 
polynucleotide with which it is associated in nature. The term "recombinant" as used with 

25 respect to a protein or polypeptide means a polypeptide produced by expression of a 

recombinant polynucleotide. In general, the gene of interest is cloned and then expressed 
in transformed organisms, as described further below. The host organism expresses the 
foreign gene to produce the protein under expression conditions. 

A "control element" refers to a polynucleotide sequence which aids in the 

30 expression of a coding sequence to which it is linked. The term includes promoters, 
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transcription termination sequences, upstream regulatory domains, polyadenylation signals, 
untranslated regions, including 5 r -UTRs and 3 ! -UTRs and when appropriate, leader 
sequences and enhancers, which collectively provide for the transcription and translation of 
a coding sequence in a host cell. 



polymerase in a host cell and initiating transcription of a downstream (3 f direction) coding 
sequence operably linked thereto. For purposes of the present invention, a promoter 
sequence includes the minimum number of bases or elements necessary to initiate 
transcription of a gene of interest at levels detectable above background. Within the 

10 promoter sequence is a transcription initiation site, as well as protein binding domains 
(consensus sequences) responsible for the binding of RNA polymerase. Eucaryotic 
promoters will often, but not always, contain "TATA" boxes and "CAT" boxes. 

A control sequence "directs the transcription" of a coding sequence in a cell when 
RNA polymerase will bind the promoter sequence and transcribe the coding sequence into 

1 5 mRNA, which is then translated into the polypeptide encoded by the coding sequence. 

"Expression cassette" or "expression construct" refers to an assembly which is 
capable of directing the expression of the sequence(s) or gene(s) of interest. The 
expression cassette includes control elements, as described above, such as a promoter 
which is operably linked to (so as to direct transcription of) the sequence(s) or gene(s) of 

20 interest, and often includes a polyadenylation sequence as well. Within certain 

embodiments of the invention, the expression cassette described herein may be contained 
within a plasmid construct. In addition to the components of the expression cassette, the 
plasmid construct may also include, one or more selectable markers, a signal which allows 
the plasmid construct to exist as single-stranded DNA (e.g., a M13 origin of replication), at 

25 least one multiple cloning site, and a "mammalian" origin of replication (e.g., a SV40 or 
adenovirus origin of replication). 

"Transformation," as used herein, refers to the insertion of an exogenous 
polynucleotide into a host cell, irrespective of the method used for insertion: for example, 
transformation by direct uptake, transfection, infection, and the like. For particular 

30 methods of transfection, see further below. The exogenous polynucleotide may be 



5 



A "promoter" as used herein is a DNA regulatory region capable of binding RNA 
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maintained as a nonintegrated vector, for example, an episome, or alternatively, may be 
integrated into the host genome. 

A "host cell" is a cell which has been transformed, or is capable of transformation, 
by an exogenous DNA sequence. 
5 By "isolated" is meant, when referring to a polypeptide, that the indicated molecule 

is separate and discrete from the whole organism with which the molecule is found in 
nature or is present in the substantial absence of other biological macromolecules of the 
same type. The term "isolated" with respect to a polynucleotide is a nucleic acid molecule 
devoid, in whole or part, of sequences normally associated with it in nature; or a sequence, 

10 as it exists in nature, but having heterologous sequences in association therewith; or a 
molecule disassociated from the chromosome. 

The term "purified" as used herein preferably means at least 75% by weight, more 
preferably at least 85% by weight, more preferably still at least 95% by weight, and most 
preferably at least 98% by weight, of biological macromolecules of the same type are 

15 present. 

"Homology" refers to the percent identity between two polynucleotide or two 
polypeptide moieties. Two DNA, or two polypeptide sequences are "substantially 
homologous" to each other when the sequences exhibit at least about 50% , preferably at 
least about 75%, more preferably at least about 80%-85%, preferably at least about 90%, 

20 and most preferably at least about 95%-98%, or more, sequence identity over a defined 

length of the molecules. As used herein, substantially homologous also refers to sequences 
showing complete identity to the specified DNA or polypeptide sequence. The term 
"substantially homologous" as used herein in reference to ANS35 generally refers to an 
HCV nucleic or amino acid sequence that is at least 60% identical to the entire sequence of 

25 the polypeptide encoded by ANS35 (see FIG. 5), where the sequence identity is preferably 
at least 75%, more preferably at least 80%, still more preferably at least about 85%, 
especially more than about 90%, most preferably 95% or greater, particularly 98% or 
greater. These homologous polypeptides include fragments, including mutants and allelic 
variants of the fragments. Identity between the two sequences is preferably determined by 

30 the Smith- Waterman homology search algorithm as implemented in the MPSRCH program 
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(Oxford Molecular), using an affine gap search with parameters gap open penalty^Yl and 
gap extension penalty=\. Thus, for example, the present invention includes an isolate 
which is 80% identical to a polypeptide encoded by ANS35. In some aspects of the 
invention, the polypeptide of the present invention is substantially homologous to the 
5 ANS35. 

In general, "identity" refers to an exact nucleotide-to-nucleotide or amino acid-to- 
amino acid correspondence of two polynucleotides or polypeptide sequences, respectively. 
Percent identity can be determined by a direct comparison of the sequence information 
between two molecules by aligning the sequences, counting the exact number of matches 

10 between the two aligned sequences, dividing by the length of the shorter sequence, and 
multiplying the result by 100. Readily available computer programs can be used to aid in 
the analysis, such as ALIGN, Dayhoff, M.O. in Atlas of Protein Sequence and Structure 
M.O. Dayhoff ed., 5 SuppL 3:353-358, National biomedical Research Foundation, 
Washington, DC, which adapts the local homology algorithm of Smith and Waterman 

15 Advances in Appl Math. 2:482-489, 1981 for peptide analysis. Programs for determining 
nucleotide sequence identity are available in the Wisconsin Sequence Analysis Package, 
Version 8 (available from Genetics Computer Group, Madison, WI) for example, the 
BESTFIT, FASTA and GAP programs, which also rely on the Smith and Waterman 
algorithm. These programs are readily utilized with the default parameters recommended 

20 by the manufacturer and described in the Wisconsin Sequence Analysis Package referred to 
above. For example, percent identity of a particular nucleotide sequence to a reference 
sequence can be determined using the homology algorithm of Smith and Waterman with a 
default scoring table and a gap penalty of six nucleotide positions. 

Another method of establishing percent identity in the context of the present 

25 invention is to use the MPSRCH package of programs copyrighted by the University of 
Edinburgh, developed by John F. Collins and Shane S. Sturrok, and distributed by 
IntelliGenetics, Inc. (Mountain View, CA). From this suite of packages the Smith- 
Waterman algorithm can be employed where default parameters are used for the scoring 
table (for example, gap open penalty of 12, gap extension penalty of one, and a gap of six). 

30 From the data generated the "Match" value reflects "sequence identity." Other suitable 
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programs for calculating the percent identity or similarity between sequences are generally 
known in the art, for example, another alignment program is BLAST, used with default 
parameters. For example, BLASTN and BLASTP can be used using the following default 
parameters: genetic code = standard; filter = none; strand = both; cutoff = 60; expect = 10; 

5 Matrix = BLOSUM62; Descriptions = 50 sequences; sort by = HIGH SCORE; Databases = 
non-redundant, GenBank + EMBL + DDBJ + PDB + GenBank CDS translations + Swiss 
protein + Spupdate + PIR. Details of these programs can be found at the following internet 
address: http://www.ncbi.nlm.gov/cgi-bin/BLAST. 

Alternatively, homology can be determined by hybridization of polynucleotides 

10 under conditions which form stable duplexes between homologous regions, followed by 
digestion with single-stranded-specific nuclease(s), and size determination of the digested 
fragments. DNA sequences that are substantially homologous can be identified in a 
Southern hybridization experiment under, for example, stringent conditions, as defined for 
that particular system. Defining appropriate hybridization conditions is within the skill of 

15 the art. See, e.g., Sambrook et aL, supra; DNA Cloning, supra; Nucleic Acid 
Hybridization, supra. 

"Stringency" refers to conditions in a hybridization reaction that favor association 
of very similar sequences over sequences that differ. For example, the combination of 
temperature and salt concentration should be chosen that is approximately 120 to 200°C 

20 below the calculated Tm of the hybrid under study. The temperature and salt conditions 
can often be determined empirically in preliminary experiments in which samples of 
genomic DNA immobilized on filters are hybridized to the sequence of interest and then 
washed under conditions of different stringencies. See Sambrook et aL at page 9.50. 

Variables to consider when performing, for example, a Southern blot are (1) the 

25 complexity of the DNA being blotted and (2) the homology between the probe and the 
sequences being detected. The total amount of the fragment(s) to be studied can vary a 
magnitude of 1 0, from 0. 1 to 1 ^g for a plasmid or phage digest to 1 0" 9 to 10 s g for a single 
copy gene in a highly complex eukaryotic genome. For lower complexity polynucleotides, 
substantially shorter blotting, hybridization, and exposure times, a smaller amount of 

30 starting polynucleotides, and lower specific activity of probes can be used. For example, a 
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single-copy yeast gene can be detected with an exposure time of only 1 hour starting with 1 
\xg of yeast DNA, blotting for two hours, and hybridizing for 4-8 hours with a probe of 10 s 
cpm/jag. For a single-copy mammalian gene a conservative approach would start with 10 
|ig of DNA, blot overnight, and hybridize overnight in the presence of 10% dextran sulfate 

5 using a probe of greater than 10 8 cpm/p,g, resulting in an exposure time of -24 hours. 

Several factors can affect the melting temperature (Tm) of a DNA-DNA hybrid 
between the probe and the fragment of interest, and consequently, the appropriate 
conditions for hybridization and washing. In many cases the probe is not 100% 
homologous to the fragment. Other commonly encountered variables include the length 

10 and total G+C content of the hybridizing sequences and the ionic strength and formamide 
content of the hybridization buffer. The effects of all of these factors can be approximated 
by a single equation: 

Tm= 81 + 16.6(log 10 Ci) + 0.4[%(G + C)]-0.6(%formamide) - 600/n-1.5(%mismatch). 
where Ci is the salt concentration (monovalent ions) and n is the length of the hybrid in 

15 base pairs (slightly modified from Meinkoth & Wahl (1984) Anal Biochem. 138: 267- 

284). In general, convenient hybridization temperatures in the presence of 50% formamide 
are 42°C for a probe with is 95% to 100% homologous to the target fragment, 37°C for 
90% to 95% homology, and 32°C for 85% to 90% homology. For lower homologies, 
formamide content should be lowered and temperature adjusted accordingly, using the 

20 equation above. If the homology between the probe and the target fragment are not known, 
the simplest approach is to start with both hybridization and wash conditions which are 
nonstringent If non-specific bands or high background are observed after 
autoradiography, the filter can be washed at high stringency and reexposed. If the time 
required for exposure makes this approach impractical, several hybridization and/or 

25 washing stringencies should be tested in parallel. 

By "nucleic acid immunization" is meant the introduction of a nucleic acid 
molecule encoding one or more selected antigens into a host cell, for the in vivo expression 
of the antigen or antigens. The nucleic acid molecule can be introduced directly into the 
recipient subject, such as by injection, inhalation, oral, intranasal and mucosal 

30 administration, or the like, or can be introduced ex vivo, into cells which have been 
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removed from the host. In the latter case, the transformed cells are reintroduced into the 
subject where an immune response can be mounted against the antigen encoded by the 
nucleic acid molecule. 

An "open reading frame" or ORF is a region of a polynucleotide sequence which 
5 encodes a polypeptide; this region can represent a portion of a coding sequence or a total 
coding sequence. 

As used herein, the term "antibody" refers to a polypeptide or group of polypeptides 
which comprise at least one antigen binding site. An "antigen binding site" is formed from 
the folding of the variable domains of an antibody molecule(s) to form three-dimensional 

10 binding sites with an internal surface shape and charge distribution complementary to the 
features of an epitope of an antigen, which allows specific binding to form an antibody- 
antigen complex. An antigen binding site may be formed from a heavy- and/or light-chain 
domain (VH and VL, respectively), which form hypervariable loops which contribute to 
antigen binding. The term "antibody" includes, without limitation, polyclonal antibodies, 

15 monoclonal antibodies, chimeric antibodies, altered antibodies, univalent antibodies, Fab 
proteins, and single-domain antibodies. In many cases, the binding phenomena of 
antibodies to antigens is equivalent to other ligand/anti-ligand binding. 

If polyclonal antibodies are desired, a selected mammal (e.g., mouse, rabbit, goat, 
horse, etc.) is immunized with an immunogenic polypeptide bearing an HCV epitope(s). 

20 Serum from the immunized animal is collected and treated according to known procedures. 
If serum containing polyclonal antibodies to an HCV epitope contains antibodies to other 
antigens, the polyclonal antibodies can be purified by immunoaffinity chromatography. 
Techniques for producing and processing polyclonal antisera are known in the art, see for 
example, Mayer and Walker, eds. (1987) IMMUNOCHEMICAL METHODS IN CELL 

25 AND MOLECULAR BIOLOGY (Academic Press, London). 

Monoclonal antibodies directed against HCV epitopes can also be readily produced 
by one skilled in the art. The general methodology for making monoclonal antibodies by 
hybridomas is well known. Immortal antibody-producing cell lines can be created by cell 
fusion, and also by other techniques such as direct transformation of B lymphocytes with 

30 oncogenic DNA, or transfection with Epstein-Barr virus. See, e.g., M. Schreier et al. 
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(1980) HYBRIDOMA TECHNIQUES; Hammerling et al. (1981), MONOCLONAL 
ANTIBODIES AND T-CELL HYBRIDOMAS; Kennett et al. (1980) MONOCLONAL 
ANTIBODIES; see also, U.S. Pat. Nos. 4,341,761; 4,399,121; 4,427,783; 4,444,887; 
4,466,917; 4,472,500; 4,491,632; and 4,493,890. Panels of monoclonal antibodies 

5 produced against HCV epitopes can be screened for various properties; i.e., for isotype, 
epitope affinity, etc. As used herein, a "single domain antibody" (dAb) is an antibody 
which is comprised of an HL domain, which binds specifically with a designated antigen. 
A dAb does not contain a VL domain, but may contain other antigen binding domains 
known to exist to antibodies, for example, the kappa and lambda domains. Methods for 

10 preparing dabs are known in the art. See, for example, Ward et al, Nature 341 : 544 (1989). 

Antibodies can also be comprised of VH and VL domains, as well as other known 
antigen binding domains. Examples of these types of antibodies and methods for their 
preparation and known in the art (see, e.g., U.S. Pat. No. 4,816,467, which is incorporated 
herein by reference), and include the following. For example, "vertebrate antibodies" refers 

15 to antibodies which are tetramers or aggregates thereof, comprising light and heavy chains 
which are usually aggregated in a " Y" configuration and which may or may not have 
covalent linkages between the chains. In vertebrate antibodies, the amino acid sequences of 
the chains are homologous with those sequences found in antibodies produced in 
vertebrates, whether in situ or in vitro (for example, in hybridomas). Vertebrate antibodies 

20 include, for example, purified polyclonal antibodies and monoclonal antibodies, methods 
for the preparation of which are described infra. 

"Hybrid antibodies" are antibodies where chains are separately homologous with 
reference to mammalian antibody chains and represent novel assemblies of them, so that 
two different antigens are precipitable by the tetramer or aggregate. In hybrid antibodies, 

25 one pair of heavy and light chains are homologous to those found in an antibody raised 
against a first antigen, while a second pair of chains are homologous to those found in an 
antibody raised against a second antibody. This results in the property of "divalence", i.e., 
the ability to bind two antigens simultaneously. Such hybrids can also be formed using 
chimeric chains, as set forth below. 
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"Chimeric antibodies" refers to antibodies in which the heavy and/or light chains 
are fusion proteins. Typically, one portion of the amino acid sequences of the chain is 
homologous to corresponding sequences in an antibody derived from a particular species or 
a particular class, while the remaining segment of the chain is homologous to the 

5 sequences derived from another species and/or class. Usually, the variable region of both 
light and heavy chains mimics the variable regions or antibodies derived from one species 
of vertebrates, while the constant portions are homologous to the sequences in the 
antibodies derived from another species of vertebrates. However, the definition is not 
limited to this particular example. Also included is any antibody in which either or both of 

1 0 the heavy or light chains are composed of combinations of sequences mimicking the 
sequences in antibodies of different sources, whether these sources be from differing 
classes or different species of origin, and whether or not the fusion point is at the 
variable/constant boundary. Thus, it is possible to produce antibodies in which neither the 
constant nor the variable region mimic know antibody sequences. It then becomes possible, 

15 for example, to construct antibodies whose variable region has a higher specific affinity for 
a particular antigen, or whose constant region can elicit enhanced complement fixation, or 
to make other improvements in properties possessed by a particular constant region. 

Another example is "altered antibodies", which refers to antibodies in which the 
naturally occurring amino acid sequence in a vertebrate antibody has been varies. Utilizing 

20 recombinant DNA techniques, antibodies can be redesigned to obtain desired 

characteristics. The possible variations are many, and range from the changing of one or 
more amino acids to the complete redesign of a region, for example, the constant region. 
Changes in the constant region, in general, to attain desired cellular process characteristics, 
e.g., changes in complement fixation, interaction with membranes, and other effector 

25 functions. Changes in the variable region can be made to alter antigen binding 

characteristics. The antibody can also be engineered to aid the specific delivery of a 
molecule or substance to a specific cell or tissue site. The desired alterations can be made 
by known techniques in molecular biology, e.g., recombinant techniques, site-directed 
mutagenesis, etc. 
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Yet another example are "univalent antibodies", which are aggregates comprised of 
a heavy-chain/light-chain dimer bound to the Fc (i.e., stem) region of a second heavy 
chain. This type of antibody escapes antigenic modulation. See, e.g., Glennie et al. Nature 
295: 712 (1982). Included also within the definition of antibodies are "Fab" fragments of 
5 antibodies. The "Fab" region refers to those portions of the heavy and light chains which 
are roughly equivalent, or analogous, to the sequences which comprise the branch portion 
of the heavy and light chains, and which have been shown to exhibit immunological 
binding to a specified antigen, but which lack the effector Fc portion. "Fab" includes 
aggregates of one heavy and one light chain (commonly known as Fab r ), as well as 
10 tetramers containing the 2H and 2L chains (referred to as F(ab)2), which are capable of 
selectively reacting with a designated antigen or antigen family. Fab antibodies can be 
divided into subsets analogous to those described above, i.e., "vertebrate Fab", "hybrid 
Fab", "chimeric Fab", and "altered Fab". Methods of producing Fab fragments of 
antibodies are known within the art and include, for example, proteolysis, and synthesis by 
1 5 recombinant techniques. 

"Antigen-antibody complex" refers to the complex formed by an antibody that is 
specifically bound to an epitope on an antigen. 

"Immunogenic polypeptide" refers to a polypeptide that elicits a cellular and/or 
humoral immune response in a mammal, whether alone or linked to a carrier, in the 
20 presence or absence of an adjuvant. 

"Antigenic determinant" refers to the site on an antigen or hapten to which a 
specific antibody molecule or specific cell surface receptor binds. 

As used herein, "treatment" refers to any of (i) the prevention of infection or 
reinfection, as in a traditional vaccine, (ii) the reduction or elimination of symptoms, and 
25 (iii) the substantial or complete elimination of the pathogen in question. Treatment may be 
effected prophylactically (prior to infection) or therapeutically (following infection). 

By "vertebrate subject" is meant any member of the subphylum cordata, including, 
without limitation, humans and other primates, including non-human primates such as 
chimpanzees and other apes and monkey species; farm animals such as cattle, sheep, pigs, 
30 goats and horses; domestic mammals such as dogs and cats; laboratory animals including 
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rodents such as mice, rats and guinea pigs; birds, including domestic, wild and game birds 
such as chickens, turkeys and other gallinaceous birds, ducks, geese, and the like. The 
term does not denote a particular age. Thus, both adult and newborn individuals are 
intended to be covered. The invention described herein is intended for use in any of the 
5 above vertebrate species, since the immune systems of all of these vertebrates operate 
similarly. 

II Modes of Carrying out the Invention 

Before describing the present invention in detail, it is to be understood that this 
10 invention is not limited to particular formulations or process parameters as such may, of 
course, vary. It is also to be understood that the terminology used herein is for the purpose 
of describing particular embodiments of the invention only, and is not intended to be 
limiting. 

Although a number of compositions and methods similar or equivalent to those 
1 5 described herein can be used in the practice of the present invention, the preferred materials 
and methods are described herein. 



General Overview 

An aim of an HCV vaccine is to generate broad immunity to a wide breadth of 
20 antigens because HCV is so divergent and because humoral as well as cellular immune 
responses are desirable to combat this human pathogen. While antibodies generated 
against the envelope glycoproteins) might aid in virus neutralization, there is additional 
benefit to be derived from a vaccine that includes other regions. The likelihood of T-helper 
responses generated against a polypeptide would be helpful in a vaccine setting as would 
25 generation of cytotoxic T cells. The non-structural region represents such a candidate 

antigen, but processing by the protease generates several polypeptides, making purification 
complicated. It would be advantageous, therefore, to derive a non-structural cassette that is 
unprocessed by the NS3 protease. 

The present invention solves this and other problems using compositions and 
30 methods involving an N-terminal deletion in NS3, which removes the catalytic domain. 
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As such, some or all of the remainder of the non-structural region (through NS5B) is 
expressed as an intact polypeptide. Expression of this species has been documented in 
mammalian cells as well as in yeast. Further, in certain aspects, polynucleotides encoding 
HCV core polypeptides (or fragments thereof) are added (e.g. operably linked) to the 
carboxy-terminus of the non-structural cassette. As the core coding region is relatively 
highly conserved among HCV isolates, the presence of this region may enhance the 
immune response. Because core has at its C-terminus a very hydrophobic domain (amino 
acids 174-191), shorter versions of core were also engineered onto the polypeptide. As 
described in detail herein, the truncation of core to amino acid 121 yielded higher 
expression than the amino acid 173 truncation when engineered onto the C-terminus of the 
mutant NS polypeptide. The combination of most of the non-structural region fused to a 
C-terminally truncated core into a polypeptide is novel and has advantages for vaccine 
immunization. Moreover, because the aim is not necessarily to generate antibody 
responses to this polypeptide, there is no need to maintain a native conformation, enabling 
a more facile purification protocol. 

Mutant HCV Non-Structural Polypeptides 

Genomes of HCV strains contain a single open reading frame of approximately 
9,000 to 12,000 nucleotides, which is transcribed into a polyprotein. An HCV polyprotein 
is cleaved to produce at least ten distinct products, in the order of NH 2 - Core-El-E2-p7- 
NS2-NS3-NS4a-NS4b-NS5a-NS5b-COOH. Mutant HCV polypeptides of the invention 
contain an N-terminal deletion in NS3, which removes or disables the catalytic domain. 
Preferably, the polypeptides also include the remainder of the non-structural region, 
although in certain embodiments, the polypeptides may include less than all of the 
remaining NS polypeptides, for example mutant NS polypeptides including any 
combinations of NS2-NS3-NS4a-NS4b-NS5a-NS5b (e.g., NS3NS3-NS5a-NS5b; NS3- 
NS4a-NS4b; NS3-NS4a-NS4b-NS5a; NS3-NS4b-NS5a-NS5b; NS3-NS4a-NS5a; NS3- 
NS4b-NS5a; NS3-NS4b-NS5b; etc.). 

The HCV NS3 protein functions as a protease and a helicase and occurs at 
approximately amino acid 1027 to amino acid 1657 of the polyprotein (numbered relative 
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to HCV-1). See Choo et al (1991) Proc. Natl Acad. Sci. USA 88:2451-2455. HCV NS4 
occurs at approximately amino acid 1658 to amino acid 1972, NS5a occurs at 
approximately amino acid 1973 to amino acid 2420, and HCV NS5b occurs at 
approximately amino acid 2421 to amino acid 301 1 of the polyprotein (numbered relative 

5 to HCV-1) (Choo et al., 1991). 

The mutant polypeptides described herein can either be full-length polypeptides or 
portions of NS3, NS4 (NS4a and NS4b), NS5a, and NS5b polypeptides. Epitopes of NS3, 
NS4 (NS4a and NS4b), NS5a, NS5b, NS3NS4NS5a, and NS3NS4NS5aNS5b can be 
identified by several methods. For example, NS3, NS4, NS5a, NS5b polypeptides or 

10 fusion proteins comprising any combination of the above, can be isolated, for example, by 
immunoaffinity purification using a monoclonal antibody for the polypeptide or protein. 
The isolated protein sequence can then be screened by preparing a series of short peptides 
by proteolytic cleavage of the purified protein, which together span the entire protein 
sequence. By starting with, for example, 100-mer polypeptides, each polypeptide can be 

1 5 tested for the presence of epitopes recognized by a T cell receptor on an HC V-activated T 
cell, progressively smaller and overlapping fragments can then be tested from an identified 
100-mer to map the epitope of interest. 

Epitopes recognized by a T cell receptor on an HCV-activated T cell can be 
identified by, for example, 51 Cr release assay (see Example 2) or by lymphoproliferation 

20 assay (see Example 4). In a 51 Cr release assay, target cells can be constructed that display 
the epitope of interest by cloning a polynucleotide encoding the epitope into an expression 
vector and transforming the expression vector into the target cells. Non-structural 
polypeptides can occur in any order in the fusion protein. If desired, at least 2, 3, 4, 5, 6, 7, 
8, 9, or 10 or more of one or more of the polypeptides may occur in the fusion protein. 

25 Multiple viral strains of HCV occur, and NS3, NS4, NS5a, and NS5b polypeptides of any 
of these strains can be used in a fusion protein. 

Nucleic acid and amino acid sequences of a number of HCV strains and isolates, 
including nucleic acid and amino acid sequences of NS3, NS4, NS5a, NS5b genes and 
polypeptides have been determined. For example, isolate HCV J 1.1 is described in Kubo 

30 et al (1989) Japan. Nucl. Acids Res. 17:10367-10372; Takeuchi et a/.(1990) Gene 
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91 :287-291; Takeuchi et al (1990) J. Gen. Virol. 71:3027-3033; and Takeuchi et al. 
(1990) Nucl Acids Res. 18:4626. The complete coding sequences of two independent 
isolates, HCV- J and BK, are described by Kato et al, (1990) Proc. Natl. Acad. Sci. USA 
87:9524-9528 and Takamizawa et al, (1991) J. Virol. 65:1 105-1 1 13 respectively. 



Bull. 46:423-441; Choo et al (1991) Proc. Natl. Acad. Sci. USA 88:2451-2455 and Han et 
al (1991) Proc. Natl. Acad. Sci. USA 88:171 1-1715. HCV isolates HC-J1 and HC-J4 are 
described in Okamoto et al (1991) Japan J. Exp. Med. 60:167-177. HCV isolates HCT 
18-, HCT 23, Th, HCT 27, EC1 and EC10 are described in Weiner et al (1991) Virol. 

10 180:842-848. HCV isolates Pt-1, HCV-K1 and HCV-K2 are described in Enomoto et al 
(1990) Biochem. Biophys. Res. Commun. 170:1021-1025. HCV isolates A, C, D & E are 
described in Tsukiyama-Kohara et al (1991) Virus Genes 5:243-254. 

Each of the mutant HCV polypeptides containing at least portions of NS3, NS4 and 
NS5 can be obtained from the same HCV strain or isolate or from different HCV strains or 

1 5 isolates. Thus, each non-structural region of the polypeptide can be from the same HCV 
strain or isolate or from each different HCV strains or isolates. In addition to the mutant 
HCV non-structural polypeptides described herein, the proteins can contain other 
polypeptides derived from the HCV polyprotein. For example, it may be desirable to 
include polypeptides derived from the core region of the HCV polyprotein. This region 

20 occurs at amino acid positions 1-191 of the HCV polyprotein, numbered relative to HCV- 
1 . Either the full-length protein or epitopes of the full-length protein may be used in the 
subject fusions, such as those epitopes found between amino acids 10-53, amino acids 10- 
45, amino acids 67-88, amino acids 120-130, or any of the core epitopes identified in, e.g., 
Houghton et al., U.S. Patent No. 5,350,671; Chien et al., Proc. Natl Acad. Sci. USA (1992) 

25 89:10011-10015; Chien et al, J. Gastroent Hepatol (1993) 8:S33-39; Chien et al, 

International Publication No. WO 93/00365; Chien, D.Y., International Publication No. 
WO 94/01778; and commonly owned, U.S. Patent No. 6,150,087, the disclosures of which 
are incorporated herein by reference in their entireties. When present, additional non- 
structural HCV polypeptides such as core can be obtained from the same HCV strain or 

30 isolate or from different HCV strains or isolates. 



5 



Publications that describe HCV-1 isolates include Choo et al (1990) Brit. Med. 
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Preferably, the abovfc-described mutant proteins, as well as the individual 
components of these proteins, are produced recombinantly. A polynucleotide encoding 
these proteins can be introduced into an expression vector which can be expressed in a 
suitable expression system. A variety of bacterial, yeast, mammalian, insect and plant 

5 expression systems are available in the art and any such expression system can be used. 
Optionally, a polynucleotide encoding these proteins can be translated in a cell-free 
translation system. Such methods are well known in the art. The proteins also can be 
constructed by solid phase protein synthesis. 

If desired, the mutant polypeptides, or the individual components of these 

10 polypeptides, also can contain other amino acid sequences, such as amino acid linkers or 
signal sequences, as well as ligands useful in protein purification, such as glutathione-S- 
transferase and staphylococcal protein A. 

Polynucleotides 

1 5 The polynucleotides of the present invention are not necessarily physically derived 

from the nucleotide sequences shown, but can be generated in any manner, including, for 
example, chemical synthesis or DNA replication or reverse transcription or transcription. 
In addition, combinations of regions corresponding to that of the designated sequences can 
be modified in ways known to the art to be consistent with an intended use. 

20 The DNA encoding the desired polypeptide, whether in fused or mature form, and 

whether or not containing a signal sequence to permit secretion, can be ligated into 
expression vectors suitable for any convenient host. Both eukaryotic and prokaryotic host 
systems are presently used in forming recombinant polypeptides, and a summary of some 
of the more common control systems and host cell is given below. The polypeptide 

25 produced in such host cells is then isolated from lysed cells or from the culture medium 
and purified to the extent needed for its intended use. 

Purification can be by techniques known in the art, for example, differential 
extraction, salt fractionation, chromatography on ion exchange resins, affinity 
chromatography, centrifugation, alkali resolubilization of insoluble protein, and the like. 

30 See, for example, Methods in Enzymology for a variety of methods for purifying proteins. 
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Polynucleotides contain less than an entire HCV genome and can be RNA or 
single- or double-stranded DNA. Preferably, the polynucleotides are isolated free of other 
components, such as proteins and lipids. Polynucleotides of the invention can also 
comprise other nucleotide sequences, such as sequences coding for linkers, signal 
5 sequences, or ligands useful in protein purification such as glutathione-S-transferase and 
staphylococcal protein A. 

Polynucleotides encoding mutant HCV non-structural polypeptides can be isolated 
from a genomic library derived from nucleic acid sequences present in, for example, the 
plasma, serum, or liver homogenate of an HCV infected individual or can be synthesized in 

10 the laboratory, for example, using an automatic synthesizer. An amplification method such 
as PCR can be used to amplify polynucleotides from either HCV genomic DNA or cDNA. 

Further, while the polypeptides that are not NS3, NS4, or NS5 of HCV of the 
present invention can comprise a substantially complete viral domain, in many applications 
all that is required is that the polypeptide comprise an antigenic or immunogenic region of 

1 5 the virus. An antigenic region of a polypeptide is generally relatively small-typically 8 to 
10 amino acids or less in length. Fragments of as few as 5 amino acids can characterize an 
antigenic region. These segments can correspond to regions of, for example, C, El, or E2 
epitopes. Accordingly, using the cDNAs of C, El, or E2 as a basis, DNAs encoding short 
segments of C, El, or E2 polypeptides can be expressed recombinantly either as fusion 

20 proteins, or as isolated polypeptides. In addition, short amino acid sequences can be 
conveniently obtained by chemical synthesis. 

Polynucleotides encoding the polypeptides described herein can comprise coding 
sequences for these polypeptides which occur naturally or can be artificial sequences which 
do not occur in nature. These polynucleotides can be ligated to form a coding sequence for 

25 the fusion proteins using standard molecular biology techniques. If desired, 

polynucleotides can be cloned into an expression vector and transformed into, for example, 
bacterial, yeast, insect, plant or mammalian cells so that the fusion proteins of the invention 
can be expressed in and isolated from a cell culture. 

The expression of polypeptides containing these domains in a variety of 

30 recombinant host cells, including, for example, bacteria, yeast, insect, plant and vertebrate 

-31- 



PP01617.002 
PATENT 



cells, give rise to important immunological reagents which can be used for diagnosis, 
detection, and vaccines. 

The general techniques used in extracting the genome from a virus, preparing and 
probing a cDNA library, sequencing clones, constructing expression vectors, transforming 
5 cells, performing immunological assays such as radioimmunoassays and. ELISA assays, 
for growing cells in culture, and the like are known in the art and laboratory manuals are 
available describing these techniques. However, as a general guide, the following sets forth 
some sources currently available for such procedures, and for materials useful in carrying 
them out. 

1 0 Both prokaryotic and eukaryotic host cells may be used for expression of desired 

coding sequences when appropriate control sequences which are compatible with the 
designated host are used. Among prokaryotic hosts, E. coli is most frequently used. 
Expression control sequences for prokaryotes include promoters, optionally containing 
operator portions, and ribosome binding sites. Transfer vectors compatible with 

15 prokaryotic hosts are commonly derived from, for example, pBR322, a plasmid containing 
operons conferring ampicillin and tetracycline resistance, and the various pUC vectors, 
which also contain sequences conferring antibiotic resistance markers. These markers may 
be used to obtain successful transformants by selection. Commonly used prokaryotic 
control sequences include the Beta-lactamase (penicillinase) and lactose promoter systems 

20 (Chang et al. (1977), Nature 198:1056), the tryptophan (trp) promoter system (Goeddel et 
al. (1980) Nucleic Acid Res. 8:4057), the lambda-derived P[L ]promoter and N gene 
ribosome binding site (Shimatake et al. (1981) Nature 292:128) and the hybrid tac 
promoter (De Boer et al. (1983) Proc. Natl. Acad. Sci. U.S.A. 292:128) derived from 
sequences of the trp and lac UV5 promoters. The foregoing systems are particularly 

25 compatible with E. coli; if desired, other prokaryotic hosts such as strains of Bacillus or 
Pseudomonas may be used, with corresponding control sequences. 

Eukaryotic hosts include mammalian and yeast cells in culture systems. 
Mammalian cell lines available as hosts for expression are known in the art and include 
many immortalized cell lines available from the American Type Culture Collection 

30 (ATCC), including HeLa cells, Chinese hamster ovary (CHO) cells, baby hamster kidney 
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(BHK) cells, and a number of other cell lines. Suitable promoters for mammalian cells are 
also known in the art and include viral promoters such as that from Simian Virus 40 
(SV40) (Fiers (1978), Nature 273:113), Rous sarcoma virus (RSV), adenovirus (ADV), 
and bovine papilloma virus (BPV). Mammalian cells may also require terminator 

5 sequences and poly A addition sequences; enhancer sequences which increase expression 
may also be included, and sequences which cause amplification of the gene may also be 
desirable. These sequences are known in the art. Vectors suitable for replication in 
mammalian cells may include viral replicons, or sequences which insure integration of the 
appropriate sequences encoding NANBV epitopes into the host genome. 

10 The vaccinia virus system can also be used to express foreign DNA in mammalian 

cells. To express heterologous genes, the foreign DNA is usually inserted into the 
thymidine kinase gene of the vaccinia virus and then infected cells can be selected. This 
procedure is known in the art and further information can be found in these references 
(Mackett et al. J. Virol. 49: 857-864 (1984) and Chapter 7 in DNA Cloning, Vol 2, IRL 

15 Press). 

Yeast expression systems are also known to one of ordinary skill in the art. A yeast 
promoter is any DNA sequence capable of binding yeast RNA polymerase and initiating 
the downstream (3 ! ) transcription of a coding sequence {e.g., structural gene) into mRNA. 
A promoter will have a transcription initiation region which is usually placed proximal to 

20 the 5 ! end of the coding sequence. This transcription initiation region usually includes an 
RNA polymerase binding site (the "TATA Box") and a transcription initiation site. A 
yeast promoter may also have a second domain called an upstream activator sequence 
(UAS), which, if present, is usually distal to the structural gene. The UAS permits 
regulated (inducible) expression. Constitutive expression occurs in the absence of a UAS. 

25 Regulated expression may be either positive or negative, thereby either enhancing or 
reducing transcription. 

Yeast is a fermenting organism with an active metabolic pathway, therefore 
sequences encoding enzymes in the metabolic pathway provide particularly useful 
promoter sequences. Examples include alcohol dehydrogenase (ADH) (EP-A-0 284 044), 

30 enolase, glucokinase, glucose-6-phosphate isomerase, glyceraldehyde-3-phosphate- 
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dehydrogenase (GAP or GAPDH), hexokinase, phosphofractokinase, 3-phosphoglycerate 
mutase, and pyruvate kinase (PyK) (EPO-A-0 329 203). The yeast PH05 gene, encoding 
acid phosphatase, also provides useful promoter sequences (Myanohara et aL (1983) Proc. 
Natl. Acad. Sci. USA 80:1). 



promoters. For example, UAS sequences of one yeast promoter may be joined with the 
transcription activation region of another yeast promoter, creating a synthetic hybrid 
promoter. Examples of such hybrid promoters include the ADH regulatory sequence 
linked to the GAP transcription activation region (US Patent Nos. 4,876,197 and 

1 0 4,880,734). Other examples of hybrid promoters include promoters which consist of the 
regulatory sequences of either the ADH2, GAL4, GAL10, OR PHOS genes, combined with 
the transcriptional activation region of a glycolytic enzyme gene such as GAP or PyK (EP- 
A-0 164 556). Furthermore, a yeast promoter can include naturally occurring promoters of 
non-yeast origin that have the ability to bind yeast RNA polymerase and initiate 

15 transcription. Examples of such promoters include, inter alia, (Cohen et aL (1980) Proc. 
Natl. Acad. Sci. USA 77:1078; Henikoffef aL (1981) Nature 253:835; Hollenberg et aL 
(1981) Curr. Topics Microbiol. Immunol. 96:119; Hollenberg et aL (1979) 'The 
Expression of Bacterial Antibiotic Resistance Genes in the Yeast Saccharomyces 
cerevisiae," in: Plasmids of Medical, Environmental and Commercial Importance (eds. 

20 K.N. Timmis and A. Puhler); Mercerau-Puigalon et aL (1980) Gene 77:163; Panthier et 
aL (1980) Curr. Genet. 2:109). 

A DNA molecule may be expressed intracellularly in yeast. A promoter sequence 
may be directly linked with the DNA molecule, in which case the first amino acid at the N- 
terminus of the recombinant protein will always be a methionine, which is encoded by the 

25 ATG start codon. If desired, methionine at the N-terminus may be cleaved from the 
protein by in vitro incubation with cyanogen bromide. 

Fusion proteins provide an alternative for yeast expression systems, as well as in 
mammalian, baculovirus, and bacterial expression systems. Usually, a DNA sequence 
encoding the N-terminal portion of an endogenous yeast protein, or other stable protein, is 

30 fused to the 5' end of heterologous coding sequences. Upon expression, this construct will 



5 



In addition, synthetic promoters which do not occur in nature also function as yeast 
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provide a fusion of the two amino acid sequences. For example, the yeast or human 
superoxide dismutase (SOD) gene, can be linked at the 5' terminus of a foreign gene and 
expressed in yeast. The DNA sequence at the junction of the two amino acid sequences 
may or may not encode a cleavable site. See e.g., EP-A-0 196 056. Another example is a 

5 ubiquitin fusion protein. Such a fusion protein is made with the ubiquitin region that 
preferably retains a site for a processing enzyme (e.g., ubiquitin-specific processing 
protease) to cleave the ubiquitin from the foreign protein. Through this method, therefore, 
native foreign protein can be isolated (e.g., WO88/024066). 

Alternatively, foreign proteins can also be secreted from the cell into the growth 

1 0 media by creating chimeric DNA molecules that encode a fusion protein comprised of a 
leader sequence fragment that provide for secretion in yeast of the foreign protein. 
Preferably, there are processing sites encoded between the leader fragment and the foreign 
gene that can be cleaved either in vivo or in vitro. The leader sequence fragment usually 
encodes a signal peptide comprised of hydrophobic amino acids which direct the secretion 

15 of the protein from the cell. 

DNA encoding suitable signal sequences can be derived from genes for secreted 
yeast proteins, such as the yeast invertase gene (EP-A-0 012 873; JPO. 62,096,086) and 
the A-factor gene (US patent 4,588,684). Alternatively, leaders of non-yeast origin, such 
as an interferon leader, exist that also provide for secretion in yeast (EP-A-0 060 057). 

20 A preferred class of secretion leaders are those that employ a fragment of the yeast 

alpha-factor gene, which contains both a "pre" signal sequence, and a "pro" region. The 
types of alpha- factor fragments that can be employed include the full-length pre-pro alpha 
factor leader (about 83 amino acid residues) as well as truncated alpha-factor leaders 
(usually about 25 to about 50 amino acid residues) (US Patents 4,546,083 and 4,870,008; 

25 EP-A-0 324 274). Additional leaders employing an alpha-factor leader fragment that 
provides for secretion include hybrid alpha-factor leaders made with a presequence of a 
first yeast, but a pro-region from a second yeast alphafactor. (e.g., see WO 89/02463.) 

Usually, transcription termination sequences recognized by yeast are regulatory 
regions located 3 ? to the translation stop codon, and thus together with the promoter flank 

30 the coding sequence. These sequences direct the transcription of an mRNA which can be 
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translated into the polypeptide encoded by the DNA. Examples of transcription terminator 
sequence and other yeast-recognized termination sequences, such as those coding for 
glycolytic enzymes. 

Usually, the above described components, comprising a promoter, leader (if 
5 desired), coding sequence of interest, and transcription termination sequence, are put 
together into expression constructs. Expression constructs are often maintained in a 
replicon, such as an extrachromosomal element (e.g., plasmids) capable of stable 
maintenance in a host, such as yeast or bacteria. The replicon may have two replication 
systems, thus allowing it to be maintained, for example, in yeast for expression and in a 
10 prokaryotic host for cloning and amplification. Examples of such yeast-bacteria shuttle 
vectors include YEp24 (Botstein et al (1979) Gene 5:17-24), pCl/1 (Brake et al (1984) 
Proc. Natl Acad. Sci USA §7:4642-4646), and YRpl7 (Stinchcomb et al. (1982) J. 
Mol Biol 158: 1 57). In addition, a replicon may be either a high or low copy number 
plasmid. A high copy number plasmid will generally have a copy number ranging from 
15 about 5 to about 200, and usually about 10 to about 150. A host containing a high copy 
number plasmid will preferably have at least about 10, and more preferably at least about 
20. Enter a high or low copy number vector may be selected, depending upon the effect of 
the vector and the foreign protein on the host. See e.g., Brake et al, supra. 

Alternatively, the expression constructs can be integrated into the yeast genome 
20 with an integrating vector. Integrating vectors usually contain at least one sequence 
homologous to a yeast chromosome that allows the vector to integrate, and preferably 
contain two homologous sequences flanking the expression construct. Integrations appear 
to result from recombinations between homologous DNA in the vector and the yeast 
chromosome (Orr- Weaver et al. (1983) Methods in Enzymol 707:228-245). An 
25 integrating vector may be directed to a specific locus in yeast by selecting the appropriate 
homologous sequence for inclusion in the vector. See Orr- Weaver et al, supra. One or 
more expression construct may integrate, possibly affecting levels of recombinant protein 
produced (Rine et al (1983) Proc. Natl Acad. Sci USA §0:6750). The chromosomal 
sequences included in the vector can occur either as a single segment in the vector, which 
30 results in the integration of the entire vector, or two segments homologous to adjacent 
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segments in the chromosome and flanking the expression construct in the vector, which 
can result in the stable integration of only the expression construct. 

Usually, extrachromosomal and integrating expression constructs may contain 
selectable markers to allow for the selection of yeast strains that have been transformed. 

5 Selectable markers may include biosynthetic genes that can be expressed in the yeast host, 
such as ADE2, HIS4, LEU2, TRP1, and ALG7, and the G418 resistance gene, which confer 
resistance in yeast cells to tunicamycin and G41 8, respectively. In addition, a suitable 
selectable marker may also provide yeast with the ability to grow in the presence of toxic 
compounds, such as metal. For example, the presence of CUP1 allows yeast to grow in the 

10 presence of copper ions (Butt et al (1987) Microbiol Rev. 57:351). 

Alternatively, some of the above described components can be put together into 
transformation vectors. Transformation vectors are usually comprised of a selectable 
marker that is either maintained in a replicon or developed into an integrating vector, as 
described above. 

1 5 Expression and transformation vectors, either extrachromosomal replicons or 

integrating vectors, have been developed for transformation into many yeasts. For 
example, expression vectors have been developed for, inter alia, the following yeasts: 
Candida albicans (Kurtz, etal (1986) Mol Cell Biol 6:142), Candida maltosa (Kunze, 
etal (1985) J. Basic Microbiol 25:141). Hansenula polymorphs (Gleeson, et al (1986) 

20 J. Gen. Microbiol 752:3459; Roggenkamp et al (1986) Mol Gen. Genet 202:302), 
Kluyveromyces fragilis (Das, et al (1984) J. Bacteriol 755:1165), Kluyveromyces lactis 
(De Louvencourt et al (1983)/. Bacteriol 154:137; Van den Berg et al (1990) 
Bio/Technology 8: 135), Pichia guillerimondii (Kunze et al (1985) J. Basic Microbiol 
25:141), Pichiapastoris(Cregg, £tf al (1985) Mol Cell Biol 5:3376; US Patent Nos. 

25 4,837,148 and 4,929,555), Saccharomyces cerevisiae (Hinnen ^ (1978) Proc. Natl 
Acad. Sci USA 75:1929; Ito et al (1983) J. Bacteriol 753:163), Schizosaccharomyces 
pombe (Beach and Nurse (1981) Nature 300:706), and Yarrowia lipolytica (Davidow, et 
al. (1985) Curr. Genet. 70:380471 Gaillardin, et al (1985) Cum Genet. 70:49). 

Methods of introducing exogenous DNA into yeast hosts are well-known in the art, 

30 and usually include either the transformation of spheroplasts or of intact yeast cells treated 
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with alkali cations. Transformation procedures usually vary with the yeast species to be 
transformed. (See e.g., Kurtz et al (1986) Mol Cell. Biol 5:142; Kunze et al (1985) J. 
Basic Microbiol 25:141; Candida; Gleeson et al (1986) J. Gen. Microbiol 732:3459; 
Roggenkamp et al (1986) Mol Gen. Genet. 202:302; Hansenula; Das et al (1984) J. 
5 Bacteriol. 755:1165; De Louvencourt al (1983)/. Bacteriol. 754:1165; Van den Berg 
etal (1990) Bio/Technology 5:135; Kluyveromyces; Cregg a/. (1985) Mol Cell Biol 
5:3376; Kunze et al (1985)/. Basic Microbiol 25:141; US Patent Nos. 4,837,148 and 
4,929,555; Pichia; Hinnen al (1978) Proa Natl Acad. Sci. USA 75; 1929; Ito etal 
(1983)/. Bacteriol 753:163 Saccharomyces; Beach and Nurse (1981) Nature 300:706; 

10 Schizosaccharomyces; Davidow et al (1985) Curr. Genet. 70:39; Gaillardin et al 
(1985) Curr. Genet 70:49; Yarrowia). 

Bacterial expression techniques are known in the art. A bacterial promoter is any 
DNA sequence capable of binding bacterial RNA polymerase and initiating the 
downstream (3') transcription of a coding sequence (e.g., structural gene) into mRNA. A 

1 5 promoter will have a transcription initiation region which is usually placed proximal to the 
5' end of the coding sequence. This transcription initiation region usually includes an RNA 
polymerase binding site and a transcription initiation site. A bacterial promoter may also 
have a second domain called an operator, that may overlap an adjacent RNA polymerase 
binding site at which RNA synthesis begins. The operator permits negative regulated 

20 (inducible) transcription, as a gene repressor protein may bind the operator and thereby 
inhibit transcription of a specific gene. Constitutive expression may occur in the absence 
of negative regulatory elements, such as the operator. In addition, positive regulation may 
be achieved by a gene activator protein binding sequence, which, if present is usually 
proximal (5') to the RNA polymerase binding sequence. An example of a gene activator 

25 protein is the catabolite activator protein (CAP), which helps initiate transcription of the 
lac operon in Escherichia coli (E. coli) (Raibaud et al (1984) Annu. Rev. Genet. 
75:173). Regulated expression may therefore be either positive or negative, thereby either 
enhancing or reducing transcription. 

Expression and transformation vectors, either extra-chromosomal replicons or 

30 integrating vectors, have been developed for transformation into many bacteria. For 
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example, expression vectors have been developed for, inter alia, the following bacteria: 
Bacillus subtilis(Palvae* al. (1982) Proc. Natl Acad. Sci. USA 79:5582; EP-A-0 036 
259 and EP-A-0 063 953; WO 84/04541), Escherichia coli (Shimatake et al. (1981) 
Nature 292:128; Amanne* al. (1985) Gene 40:183; Studier et al. (1986) J. Mol. Biol. 
5 759:113; EP-A-0 036 776,EP-A-0 136 829 and EP-A-0 136 907), Streptococcus cremoris 
(Powells al. (19%%) Appl. Environ. Microbiol. 54:655); Streptococcus lividans (Powell 
et al. (1988) Appl. Environ. Microbiol. 5¥:655), Streptomyces lividans (US patent 
4,745,056). 

Methods of introducing exogenous DNA into bacterial hosts are well-known in the 

10 art, and usually include either the transformation of bacteria treated with CaClj or other 
agents, such as divalent cations and DMSO. DNA can also be introduced into bacterial 
cells by electroporation. Transformation procedures usually vary with the bacterial species 
to be transformed. (See e.g., Masson et al. (1989) FEMS Microbiol. Lett. 60:273; Palva 
etal. (1982) Proc. Natl. Acad. Sci. USA 79:5582; EP-A-0 036 259 and EP-A-0 063 953; 

15 WO 84/04541, Bacillus, Miller et al. (1988) Proc. Natl Acad. Sci. 55:856; Wang et al. 
(1990)/. Bacteriol 172:949; Campylobacter, Cohen al (1973) Proc. Natl. Acad. 
Sci. 69:21 10; Dower et al. (1988) Nucleic Acids Res. 16:6127; Kushner (1978) "An 
improved method for transformation of Escherichia coli with ColEl -derived plasmids. In 
Genetic Engineering: Proceedings of the International Symposium on Genetic Engineering 

20 (eds. H.W. BoyerandS. Nicosia); Mandel et al. (1970) J. Mol. Biol. 55:159; Taketo 
(1988) Biochim. Biophys. Acta 949:318; Escherichia; Chassy et al. (1987) FEMS 
Microbiol Lett. 44:173 Lactobacillus; Fiedlers al. (\9%%)Anal. Biochem 770:38, 
Pseudomonas; Augustin et al. (1990) FEMS Microbiol. Lett. 66:203, Staphylococcus, 
Baranyera/. (1980) J. Bacteriol. 144:698; Harlander (1987) "Transformation of 

25 Streptococcus lactis by electroporation, in: Streptococcal Genetics (ed. J. Ferretti and R. 
Curtiss III); Perry et al. (1981) Infect. Immun. 52:1295; Powell et al. (19S&) Appl. 
Environ. Microbiol 54:655; Somkuti et al. (1987) Proc. 4th Evr. Cong. Biotechnology 
7:412, Streptococcus) . 

In addition, viral antigens can be expressed in insect cells by the Baculovirus 

30 system. A general guide to Baculovirus expression by Summer and Smith is A Manual of 
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Methods for Baculoviras Vectors and Insect Cell Culture Procedures (Texas Agricultural 
Experiment Station Bulletin No. 1555). To incorporate the heterologous gene into the 
Baculovirus genome the gene is first cloned into a transfer vector containing some 
Baculoviras sequences. This transfer vector, when it is cotransfected with wild-type virus 

5 into insect cells, will recombine with the wild-type virus. Usually, the transfer vector will 
be engineered so that the heterologous gene will disrupt the wild-type Baculovirus 
polyhedron gene. This disruption enables easy selection of the recombinant virus since the 
cells infected with the recombinant virus will appear phenotypically different from the cells 
infected with the wild-type virus. The purified recombinant virus can be used to infect cells 

10 to express the heterologous gene. The foreign protein can be secreted into the medium if a 
signal peptide is linked in frame to the heterologous gene; otherwise, the protein will be 
bound in the cell lysates. For further information, see Smith et al Mol. & Cell. Biol. 
3:2156-2165 (1983) or Luckow and Summers in Virology 17: 31-39 (1989). 

Baculovirus expression can also be affected in plant cells. There are many plant 

1 5 cell culture and whole plant genetic expression systems known in the art. Exemplary plant 
cellular genetic expression systems include those described in patents, such as: US 
5,693,506; US 5,659,122; and US 5,608,143. Additional examples of genetic expression 
in plant cell culture has been described by Zenk, Phytochemistry 30:3861-3863 (1991). 
Descriptions of plant protein signal peptides may be found in addition to the references 

20 described above in Vaulcombe et al., Mol Gen. Genet 209:33-40 (1987); Chandler et al., 
Plant Molecular Biology 3:407-418 (1984); Rogers, J! Biol Chern. 260:3731-3738 
(1985); Rothstein et al, Gene 55:353-356 (1987); Whittier et al, Nucleic Acids Research 
15:2515-2535 (1987); Wirsel et aL, Molecular Microbiology 3:3-14 (1989); Yu et al. s Gene 
122:247-253 (1992). A description of the regulation of plant gene expression by the 

25 phytohormone, gibberellic acid and secreted enzymes induced by gibberellic acid can be 
found in R.L. Jones and J. MacMillin, Gibberellins: in: Advanced Plant Physiology,. 
Malcolm B. Wilkins, ed., 1984 Pitman Publishing Limited, London, pp. 21-52. 
References that describe other metabolically-regulated genes: Sheen, Plant Cell, 2:1027- 
1038(1990); Maas et al, EMBO J. 9:3447-3452 (1990); Benkel and Hickey, Proa Natl 

30 Acad. Set 84:1337-1339(1987). 
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All plants from which protoplasts can be isolated and cultured to give whole 
regenerated plants can be transformed by the present invention so that whole plants are 
recovered which contain the transferred gene. It is known that practically all plants can be 
regenerated from cultured cells or tissues, including but not limited to all major species of 
sugarcane, sugar beet, cotton, fruit and other trees, legumes and vegetables. Some suitable 
plants include, for example, species from the genera Fragaria, Lotus, Medicago, 
Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, Geranium, Manihot, Daucus, 
Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, Datura, Hyoscyamus, 
Lycopersion, Nicotiana, Solanum, Petunia, Digitalis, Majorana, Cichorium, Helianthus, 
Lactuca, Bromus, Asparagus, Antirrhinum, Hererocallis, Nemesia, Pelargonium, Panicum, 
Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, Glycine, Lolium, 
Zea, Triticum, Sorghum, and Datura, 

Transformation can be by any method for introducing polynucleotides into a host 
cell, including, for example packaging the polynucleotide in a virus and transducing a host 
cell with the virus, and by direct uptake of the polynucleotide. The transformation 
procedure used depends upon the host to be transformed. Bacterial transformation by direct 
uptake generally employs treatment with calcium or rubidium chloride (Cohen (1972), 
Proc. Natl. Acad. Sci. U.S.A. 69:2110; Maniatis et al. (1982), MOLECULAR CLONING; 
A LABORATORY MANUAL (Cold Spring Harbor Press, Cold Spring Harbor, N.Y.). 
Yeast transformation by direct uptake may be carried out using the method of Hinnen et al. 
(1978) Proc. Natl. Acad. Sci. U.S.A. 75: 1929. Mammalian transformations by direct 
uptake may be conducted using the calcium phosphate precipitation method of Graham and 
Van der Eb (1978), Virology 52:546 or the various known modifications thereof. 

Vector construction employs techniques which are known in the art. Site-specific 
DNA cleavage is performed by treating with suitable restriction enzymes under conditions 
which generally are specified by the manufacturer of these commercially available 
enzymes. The cleaved fragments may be separated using polyacrylamide or agarose gel 
electrophoresis techniques, according to the general procedures found in Methods in 
Enzymology (1980) 65:499-560. Sticky ended cleavage fragments may be blunt ended 
using E. coli DNA polymerase I (Klenow) in the presence of the appropriate 
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deoxynucleotide triphosphates (dNTPs) present in the mixture. Treatment with SI nuclease 
may also be used, resulting in the hydrolysis of any single stranded DNA portions. 

Ligations are carried out using standard buffer and temperature conditions using T4 
DNA ligase and ATP; sticky end ligations require less ATP and less ligase than blunt end 

5 ligations. When vector fragments are used as part of a ligation mixture, the vector fragment 
is often treated with bacterial alkaline phosphatase (BAP) or calf intestinal alkaline 
phosphatase to remove the S'-phosphate and thus prevent religation of the vector; 
alternatively, restriction enzyme digestion of unwanted fragments can be used to prevent 
ligation. Ligation mixtures are transformed into suitable cloning hosts, such as E. coli, and 

10 successful transformants selected by, for example, antibiotic resistance, and screened for 
the correct construction. 

Synthetic oligonucleotides may be prepared using an automated oligonucleotide 
synthesizer as described by Warner (1984), DNA 3:401. If desired, the synthetic strands 
may be labeled with 32 P by treatment with polynucleotide kinase in the presence of 32 P- 

15 ATP, using standard conditions for the reaction. DNA sequences, including those isolated 
from cDNA libraries, may be modified by known techniques, including, for example site 
directed mutagenesis, as described by Zoller (1982), Nucleic Acids Res. 10:6487. 

The expression constructs of the present invention, including the desired fusion, or 
individual expression constructs comprising the individual components of these fusions, 

20 may be used for nucleic acid immunization, to activate HCV-specific T cells, using 

standard gene delivery protocols. Methods for gene delivery are known in the art. See, 
e.g., U.S. Patent Nos. 5,399,346, 5,580,859, 5,589,466, incorporated by reference herein in 
their entireties. Genes can be delivered either directly to the vertebrate subject or, 
alternatively, delivered ex vivo, to cells derived from the subject and the cells reimplanted 

25 in the subject. For example, the constructs can be delivered as plasmid DNA, e.g., 
contained within a plasmid, such as pBR322, pUC, or ColEl 

Additionally, the expression constructs can be packaged in liposomes prior to 
delivery to the cells. Lipid encapsulation is generally accomplished using liposomes which 
are able to stably bind or entrap and retain nucleic acid. The ratio of condensed DNA to 

30 lipid preparation can vary but will generally be around 1 : 1 (mg DNA.micromoles lipid), or 
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more of lipid. For a review of the use of liposomes as carriers for delivery of nucleic acids, 
see, Hug and Sleight, Biochim. Biophys. Acta. (1991) 1097 :1-17; Straubinger et al., in 
Methods ofEnzymology (1983), Vol. 101, pp. 512-527. 

Liposomal preparations for use with the present invention include cationic 

5 (positively charged), anionic (negatively charged) and neutral preparations, with cationic 
liposomes particularly preferred. Cationic liposomes are readily available. For example, 
N[l-2,3-dioleyloxy)propyl]-N,N,N-triethylammonium (DOTMA) liposomes are available 
under the trademark Lipofectin, from GIBCO BRL, Grand Island, NY. (See, also, Feigner 
et al., Proa Natl Acad. ScL USA (1987) 84:7413-7416). Other commercially available 

1 0 lipids include transfectace (DDAB/DOPE) and DOTAP/DOPE (Boerhinger). Other 

cationic liposomes can be prepared from readily available materials using techniques well 
known in the art. See, e.g., Szoka et al., Proa Natl. Acad. ScL USA (1978) 75:4194-4198; 
PCT Publication No. WO 90/1 1092 for a description of the synthesis of DOTAP (1,2- 
bis(oleoyloxy)-3-(trimethylammonio)propane) liposomes. The various liposome-nucleic 

15 acid complexes are prepared using methods known in the art. See, e.g., Straubinger et al., 
in METHODS OF IMMUNOLOGY (1983), Vol. 101, pp. 512-527; Szoka et al., Proa 
Natl Acad. Set USA (1978) 75:4194-4198; Papahadjopoulos et al, Biochim. Biophys. Acta 
(1975) 394:483; Wilson et al., Cell (1979) 17:77); Deamer and Bangham, Biochim. 
Biophys. Acta (1976) 443:629; Ostro et al, Biochem. Biophys. Res. Commun. (1977) 

20 76:836; Fraley et al, Proa Natl Acad. Set USA (1979) 76:3348); Enoch and Strittmatter, 
Proa Natl Acad. Sci. USA (1979) 76:145); Fraley et al., /. Biol Chem. (1980) 255:10431; 
Szoka and Papahadjopoulos, Proa Natl. Acad. Sci. USA (1978) 75:145; and Schaefer- 
Ridder et al., Science (1982) 215:166. 

The DNA can also be delivered in cochleate lipid compositions similar to those 

25 described by Papahadjopoulos et al., Biochem. Biophys. Acta. (1975) 394 :483-491. See, 
also, U.S. Patent Nos. 4,663,161 and 4,871,488. 

A number of viral based systems have been developed for gene transfer into 
mammalian cells. For example, retroviruses provide a convenient platform for gene 
delivery systems, such as murine sarcoma virus, mouse mammary tumor virus, Moloney 

30 murine leukemia virus, and Rous sarcoma virus. A selected gene can be inserted into a 
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vector and packaged in retroviral particles using techniques known in the art. The 
recombinant virus can then be isolated and delivered to cells of the subject either in vivo or 
ex vivo. A number of retroviral systems have been described (U.S. Patent No. 5,219,740; 
Miller and Rosman, BioTechniques (1989) 7:980-990; Miller, A.D., Human Gene Therapy 

5 (1990) 1:5-14; Scarpa et al., Virology (1991) 180:849-852; Burns et al, Proa Natl Acad. 
Set USA (1993) 90:8033-8037; and Boris-Lawrie and Temin, Cur. Opin. Genet. Develop. 
(1993) 3:102-109. Briefly, retroviral gene delivery vehicles of the present invention may 
be readily constructed from a wide variety of retroviruses, including for example, B, C, and 
D type retroviruses as well as spumaviruses and Antiviruses such as FIV, HIV, HIV-1, 

10 HIV-2 and SIV (see RNA Tumor Viruses, Second Edition, Cold Spring Harbor 

Laboratory, 1985). Such retroviruses may be readily obtained from depositories or 
collections such as the American Type Culture Collection ("ATCC"; 10801 University 
Blvd., Manassas, VA 201 10-2209), or isolated from known sources using commonly 
available techniques. 

15 A number of adenovirus vectors have also been described, such as adenovirus Type 

2 and Type 5 vectors. Unlike retroviruses which integrate into the host genome, 
adenoviruses persist extrachromosomally thus minimizing the risks associated with 
insertional mutagenesis (Haj-Ahmad and Graham, J. Virol (1986) 57:267-274; Bett et al, 
J. Virol. (1993) 67:591 1-5921; Mittereder et al, Human Gene Therapy (1994) 5:717-729; 

20 Seth et al., J. Virol (1994) 68:933-940; Barr et al., Gene Therapy (1994) 1:51-58; Berkner, 
K.L. BioTechniques (1988) 6:616-629; and Rich et al., Human Gene Therapy (1993) 
4:461-476). 

Molecular conjugate vectors, such as the adenovirus chimeric vectors described in 
Michael et al., J. Biol Chem. (1993) 268:6866-6869 and Wagner et al, Proc. Natl Acad. 
25 Set USA (1992) 89:6099-6103, can also be used for gene delivery. 

Members of the Alphavirus genus, such as but not limited to vectors derived from 
the Sindbis and Semliki Forest viruses, VEE, will also find use as viral vectors for 
delivering the gene of interest. For a description of Sindbis-viras derived vectors useful for 
the practice of the instant methods, see, Dubensky et al., Virol (1996) 70:508-519; and 
30 International Publication Nos. WO 95/07995 and WO 96/17072. 
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Other vectors can be used, including but not limited to simian virus 40, 
cytomegalovirus. Bacterial vectors, such as Salmonella ssp. Yersinia enterocolitica, 
Shigella spp., Vibrio cholerae, Mycobacterium strain BCG, and Listeria monocytogenes 
can be used. Minichromosomes such as MC and MCI, bacteriophages, cosmids (plasmids 
into which phage lambda cos sites have been inserted) and replicons (genetic elements that 
are capable of replication under their own control in a cell) can also be used. 

The expression constructs may also be encapsulated, adsorbed to, or associated 
with, particulate carriers. Such carriers present multiple copies of a selected molecule to 
the immune system and promote trapping and retention of molecules in local lymph nodes. 
The particles can be phagocytosed by macrophages and can enhance antigen presentation 
through cytokine release. Examples of particulate carriers include those derived from 
polymethyl methacrylate polymers, as well as microparticles derived from poly(lactides) 
and poly(lactide-co-glycolides), known as PLG. See, e.g., Jeffery et al., Pharm, Res. 
(1993) 10:362-368; and McGee et al., J. Microencap. (1996). 

A wide variety of other methods can be used to deliver the expression constructs to 
cells. Such methods include DEAE dextran-mediated transfection, calcium phosphate 
precipitation, polylysine- or polyornithine-mediated transfection, or precipitation using 
other insoluble inorganic salts, such as strontium phosphate, aluminum silicates including 
bentonite and kaolin, chromic oxide, magnesium silicate, talc, and the like. Other useful 
methods of transfection include electroporation, sonoporation, protoplast fusion, 
liposomes, peptoid delivery, or microinjection. See, e.g., Sambrook et al, supra, for a 
discussion of techniques for transforming cells of interest; and Feigner, P.L., Advanced 
Drug Delivery Reviews (1990) 5:163-187, for a review of delivery systems useful for gene 
transfer. One particularly effective method of delivering DNA using electroporation is 
described in International Publication No. WO/0045823. 

Additionally, biolistic delivery systems employing particulate carriers such as gold 
and tungsten, are especially useful for delivering the expression constructs of the present 
invention. The particles are coated with the construct to be delivered and accelerated to 
high velocity, generally under a reduced atmosphere, using a gun powder discharge from a 
"gene gun." For a description of such techniques, and apparatuses useful therefore, see, 
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e.g., U.S. Patent Nos. 4,945,050; 5,036,006; 5,100,792; 5,179,022; 5,371,015; and 
5,478,744. 



5 



Compositions 

The invention also provides compositions comprising the HCV polypeptides or 



polynucleotides described herein. Such compositions are useful as diagnostics, for 
example, using the mutant polypeptides (or polynucleotides encoding these polypeptides) 
in diagnostic reagents. Diagnostics using polypeptides and polynucleotides are known to 
those of skill in the art. 



immunogenic polypeptides derived from the polypeptides described herein, for example 
the ANS35 polypeptide. The preparation of immunogenic compounds which contain 
immunogenic polypeptide(s) as active ingredients is known to one skilled in the art. 
Typically, such immunogenic compounds are prepared as injectables, either as liquid 

15 solutions or suspensions; solid forms suitable for solution in, or suspension in, liquid prior 
to injection can also be prepared. The preparation can also be emulsified, or the protein 
encapsulated in liposomes. 

Immunogenic and diagnostic compositions of the invention preferably comprise a 
pharmaceutically acceptable carrier. The carrier should not itself induce the production of 

20 antibodies harmful to the host. Pharmaceutically acceptable carriers are well known to 
those in the art. Such carriers include, but are not limited to, large, slowly metabolized, 
macromolecules, such as proteins, polysaccharides such as latex functionalized sepharose, 
agarose, cellulose, cellulose beads and the like, polylactic acids, polyglycolic acids, 
polymeric amino acids such as polyglutamic acid, polylysine, and the like, amino acid 

25 copolymers, and inactive virus particles. 

Pharmaceutically acceptable salts can also be used in compositions of the 
invention, for example, mineral salts such as hydrochlorides, hydrobromides, phosphates, 
or sulfates, as well as salts of organic acids such as acetates, proprionates, malonates, or 
benzoates. Especially useful protein substrates are serum albumins, keyhole limpet 

30 hemocyanin, immunoglobulin molecules, thyroglobulin, ovalbumin, tetanus toxoid, and 



10 



In addition, immunogenic compounds can be prepared from one or more 
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other proteins well known to those of skill in the art. Compositions of the invention can 
also contain liquids or excipients, such as water, saline, glycerol, dextrose, ethanol, or the 
like, singly or in combination, as well as substances such as wetting agents, emulsifying 
agents, or pH buffering agents. Liposomes can also be used as a carrier for a composition 
5 of the invention, such liposomes are described above. 

If desired, co-stimulatory molecules which improve immunogen presentation to 
lymphocytes, such as B7-1 or B7-2, or cytokines such as GM-CSF, IL-2, and IL-12, can be 
included in a composition of the invention. Optionally, adjuvants can also be included in a 
composition. Adjuvants which can be used include, but are not limited to: (1) aluminum 
10 salts (alum), such as aluminum hydroxide, aluminum phosphate, aluminum sulfate, etc; 
(2) oil-in-water emulsion formulations (with or without other specific immunostimulating 
agents such as muramyl peptides (see below) or bacterial cell wall components), such as 
for example (a) MF59 (PCT Publ. No. WO 90/14837), containing 5% Squalene, 0.5% 
Tween 80, and 0.5% Span 85 (optionally containing various amounts of MTP-PE ), 
15 formulated into submicron particles using a microfluidizer such as Model HOY 

microfluidizer (Microfluidics, Newton, MA), (b) SAF, containing 10% Squalane, 0.4% 
Tween 80, 5% pluronic-blocked polymer L121, and thr-MDP (see below) either 
micro fluidized into a submicron emulsion or vortexed to generate a larger particle size 
emulsion, and (c) Ribi™ adjuvant system (RAS), (Ribi Immunochem, Hamilton, MT) 
20 containing 2% Squalene, 0.2% Tween 80, and one or more bacterial cell wall components 
from the group consisting of monophosphorylipid A (MPL), trehalose dimycolate (TDM), 
and cell wall skeleton (CWS), preferably MPL + CWS (Detox™); (3) saponin adjuvants, 
such as Stimulon™ (Cambridge Bioscience, Worcester, MA) may be used or particles 
generated therefrom such as ISCOMs (immunostimulating complexes); (4) Complete 
25 Freund's Adjuvant (CFA) and Incomplete Freund's Adjuvant (IF A); (5) cytokines, such as 
interleukins (e.g., IL-1, IL-2, IL-4, IL-5, IL-6, IL-7, IL-12, eta), interferons (e.g., gamma 
interferon), macrophage colony stimulating factor (M-CSF), tumor necrosis factor (TNF), 
etc; (6) detoxified mutants of a bacterial ADP-ribosylating toxin such as a cholera toxin 
(CT), a pertussis toxin (PT), or an E. coli heat-labile toxin (LT), particularly LT-K63, LT- 
30 R72, CT-S109, PT-K9/G129; see, e.g., WO 93/13302 and WO 92/19265; (7) other 
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substances that act as immunostimulating agents to enhance the effectiveness of the 
composition; and (8) microparticles with adsorbed macromolecules, as described in 
copending U.S. Patent Application Serial No. 09/285,855 (filed April 2, 1999) and 
international Patent Application Serial No. PCT/US99/17308 (filed July 29, 1999). Alum 
5 and MF59 are preferred. The effectiveness of an adjuvant can be determined by measuring 
the amount of antibodies directed against an immunogenic polypeptide containing an HCV 
antigenic sequence resulting from administration of this polypeptide in immunogenic 
compounds which are also comprised of the various adjuvants. 

As mentioned above, muramyl peptides include, but are not limited to, N-acetyl- 
1 0 muramyl-L-threonyl-D-isoglutamine (thr-MDP), -acetyl-normuramyl-L-alanyl-D- 
isoglutamine (CGP 1 1637, referred to nor-MDP), N-acetylmuramyl-L-alanyl-D- 
isoglutaminyl-L-alanine-2-(r-2^ 
ethylamine (CGP 19835 A, referred to as MTP-PE), etc. 

Thus, such recombinant or synthetic HCV polypeptides can be used in vaccines and 
15 as diagnostics. Further, antibodies raised against these polypeptides can also be used as 
diagnostics, or for passive immunotherapy. In addition, antibodies to these polypeptides 
are useful for isolating and identifying HCV particles. 

Native HCV antigens can also be isolated from HCV virions. The virions can be 
grown in HCV infected cells in tissue culture, or in an infected host. 

20 

Administration and Delivery 

The polynucleotide and polypeptide compositions described herein (e.g., 
immunogenic compounds) may be administered to a subject using any suitable delivery 
means. Methods of delivering nucleic acids into host cells are discussed above. Further, 

25 HCV polynucleotides and/or polypeptides can be administered parenterally, by injection, 
usually, subcutaneously, intramuscularly, transdermally or transcutaneously. Certain 
adjuvants, e.g. LTK63, LTR72 or PLG formulations, can be administered intranasally or 
orally. Additional formulations which are suitable for other modes of administration 
include suppositories. For suppositories, traditional binders and carriers can include, for 

30 example, polyalkylene glycols or triglycerides; such suppositories can be formed from 
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mixtures containing the active ingredient in the range of 0.5% to 10%, preferably l%-2%. 
Other oral formulations include such normally employed excipients as, for example, 
pharmaceutical grades of mannitol, lactose, starch, magnesium stearate, sodium saccharine, 
cellulose, magnesium carbonate, and the like. These compositions take the form of 

5 solutions, suspensions, tablets, pills, capsules, sustained release formulations or powders 
and contain 10%-95% of active ingredient, preferably 25%-70%. 

The polypeptides of the present invention can be formulated into the immunogenic 
compound as neutral or salt forms. Pharmaceutically acceptable salts include the acid 
addition salts (formed with free amino groups of the peptide) and which are formed with 

10 inorganic acids such as, for example, hydrochloric or phosphoric acids, or such organic 
acids such as acetic, oxalic, tartaric, maleic, and the like. Salts formed with the free 
carboxyl groups can also be derived from inorganic bases such as, for example, sodium, 
potassium, ammonium, calcium, or ferric hydroxides, and such organic bases as 
isopropylamine, trimethylamine, 2-ethylamino ethanol, histidine, procaine, and the like. 

15 The immunogenic compounds are administered in a manner compatible with the 

dosage formulation, and in such amount as will be prophylactically and/or therapeutically 
effective. The quantity to be administered, which is generally in the range of 5 micrograms 
to 250 micrograms of polypeptide per dose, depends on the subject to be treated, capacity 
of the subject's immune system to synthesize antibodies, and the degree of protection 

20 desired. Precise amounts of active ingredient required to be administered may depend on 
the judgment of the practitioner and can be peculiar to each subject. 

The immunogenic compound can be given in a single dose schedule, or preferably 
in a multiple dose schedule. A multiple dose schedule is one in which a primary course of 
vaccination can be with 1-10 separate doses, followed by other doses given at subsequent 

25 time intervals required to maintain and or reenforce the immune response, for example, at 
1-4 months for a second dose, and if needed, a subsequent dose(s) after several months. 
Further, the course of administration may include polynucleotides and polypeptides, 
together or sequentially (for example, priming with a polynucleotide composition and 
boosting with a polypeptide composition). The dosage regimen will also, at least in part, 
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be determined by the need of the individual and be dependent upon the judgment of the 
practitioner. 

In certain embodiments, administration of the polynucleotides and polypeptides 
described herein is used to activate T cells. In addition to the practical advantages of 
simplicity of construction and modification, administration of polynucleotides encoding 
mutant NS polypeptides results in the synthesis of a mutant NS polypeptide in the host. 
Thus, these immunogens are presented to the host immune system with native post- 
translational modifications, structure, and conformation. The polynucleotides are 
preferably injected intramuscularly to a large mammal, such as a human, at a dose of 0.5, 
0.75, 1.0, 1.5, 2.0, 2.5, 5 or 10 mg/kg. 

The proteins and/or polynucleotides can be administered either to a mammal which 
is not infected with an HCV or can be administered to an HCV-infected mammal. The 
particular dosages of the polynucleotides or fusion proteins in a composition or will 
depend on many factors including, but not limited to the species, age, and general 
condition of the mammal to which the composition is administered, and the mode of 
administration of the composition. An effective amount of the composition of the 
invention can be readily determined using only routine experimentation. In vitro and in 
vivo models can be employed to identify appropriate doses. Generally, 0.5, 0.75, 1.0, 1.5, 
2.0, 2.5, 5 or 10 mg will be administered to a large mammal, such as a baboon, 
chimpanzee, or human. If desired, co-stimulatory molecules or adjuvants can also be 
provided before, after, or together with the compositions. 

Antibodies and Diagnostics 

Antibodies, both monoclonal and polyclonal, which are directed against HCV 
epitopes are particularly useful in diagnosis, and those which are neutralizing are useful in 
passive immunotherapy. Monoclonal antibodies, in particular, may be used to raise anti- 
idiotype antibodies. 

Anti-idiotype antibodies are immunoglobulins which carry an "internal image" of 
the antigen of the infectious agent against which protection is desired. Techniques for 
raising anti-idiotype antibodies are known in the art. See, e.g., Grzych (1985), Nature 
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316:74; MacNamara et al. (1984), Science 226:1325, Uytdehaag et al (1985), J. Immunol. 
134:1225. These anti-idiotype antibodies may also be useful for treatment and/or diagnosis 
of NANBH, as well as for an elucidation of the immunogenic regions of HCV antigens. 
An immunoassay for viral antigen may use, for example, a monoclonal antibody 

5 directed towards a viral epitope, a combination of monoclonal antibodies directed towards 
epitopes of one viral polypeptide, monoclonal antibodies directed towards epitopes of 
different viral polypeptides, polyclonal antibodies directed towards the same viral antigen, 
polyclonal antibodies directed towards different viral antigens or a combination of 
monoclonal and polyclonal antibodies. 

10 Immunoassay protocols may be based, for example, upon competition, or direct 

reaction, or sandwich type assays. Protocols may also, for example, use solid supports, or 
may be by immunoprecipitation. Most assays involve the use of labeled antibody or 
polypeptide. The labels may be, for example, fluorescent, chemiluminescent, radioactive, 
or dye molecules. Assays which amplify the signals from the probe are also known. 

15 Examples of which are assays which utilize biotin and avidin, and enzyme-labeled and 
mediated immunoassays, such as ELISA assays. 

An enzyme-linked immunosorbent assay (ELISA) can be used to measure either 
antigen or antibody concentrations. This method depends upon conjugation of an enzyme 
to either an antigen or an antibody, and uses the bound enzyme activity as a quantitative 

20 label. To measure antibody, the known antigen is fixed to a solid phase (e.g., a microplate 
or plastic cup), incubated with test serum dilutions, washed, incubated with anti- 
immunoglobulin labeled with an enzyme, and washed again. Enzymes suitable for labeling 
are known in the art, and include, for example, horseradish peroxidase. Enzyme activity 
bound to the solid phase is measured by adding the specific substrate, and determining 

25 product formation or substrate utilization colorimetrically. The enzyme activity bound is a 
direct function of the amount of antibody bound. 

To measure antigen, a known specific antibody is fixed to the solid phase, the test 
material containing antigen is added, after an incubation the solid phase is washed, and a 
second enzyme-labeled antibody is added. After washing, substrate is added, and enzyme 

30 activity is estimated colorimetrically, and related to antigen concentration. 
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The HCV fusion proteins, such as NS3 mutant and core fusion proteins, can also be 
used to produce HCV-specific polyclonal and monoclonal antibodies. HCV-specific 
polyclonal and monoclonal antibodies specifically bind to HCV antigens. 

Polyclonal antibodies can be produced by administering the fusion protein to a 
5 mammal, such as a mouse, a rabbit, a goat, or a horse. Serum from the immunized animal 
is collected and the antibodies are purified from the plasma by, for example, precipitation 
with ammonium sulfate, followed by chromatography, preferably affinity chromatography. 
Techniques for producing and processing polyclonal antisera are known in the art. 

Monoclonal antibodies directed against HCV-specific epitopes present in the fusion 
10 proteins can also be readily produced. Normal B cells from a mammal, such as a mouse, 
immunized with, e.g., a mutant NS3 polypeptide or NS-core fusion protein can be fused 
with, for example, HAT-sensitive mouse myeloma cells to produce hybridomas. 
Hybridomas producing HCV-specific antibodies can be identified using RIA or ELIS A and 
isolated by cloning in semi-solid agar or by limiting dilution. Clones producing HCV- 
1 5 specific antibodies are isolated by another round of screening. 

Antibodies, either monoclonal and polyclonal, which are directed against HCV 
epitopes, are particularly useful for detecting the presence of HCV or HCV antigens in a 
sample, such as a serum sample from an HCV-infected human. An immunoassay for an 
HCV antigen may utilize one antibody or several antibodies. An immunoassay for an 
20 HCV antigen may use, for example, a monoclonal antibody directed towards an HCV 
epitope, a combination of monoclonal antibodies directed towards epitopes of one HCV 
polypeptide, monoclonal antibodies directed towards epitopes of different HCV 
polypeptides, polyclonal antibodies directed towards the same HCV antigen, polyclonal 
antibodies directed towards different HCV antigens, or a combination of monoclonal and 
25 polyclonal antibodies. Immunoassay protocols may be based, for example, upon 

competition, direct reaction, or sandwich type assays using, for example, labeled antibody. 
The labels may be, for example, fluorescent, chemiluminescent, or radioactive. 

The polyclonal or monoclonal antibodies may further be used to isolate HCV 
particles or antigens by immunoaffmity columns. The antibodies can be affixed to a solid 
30 support by, for example, adsorption or by covalent linkage so that the antibodies retain 
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their immunoselective activity. Optionally, spacer groups may be included so that the 
antigen binding site of the antibody remains accessible. The immobilized antibodies can 
then be used to bind HCV particles or antigens from a biological sample, such as blood or 
plasma. The bound HCV particles or antigens are recovered from the column matrix by, 
5 for example, a change in pH. 



Methods of Eliciting Immune Responses 

HCV-specific T cells that are activated by the above-described polypeptides, 
expressed in vivo or in vitro preferably recognize an epitope of an HCV polypeptide such 

10 as a mutant NS3 polypeptide, including an epitope of a mutant HCV polypeptide. HCV- 
specific T cells can be CD8 + or CD4\ 

HCV-specific CD8 + T cells preferably are cytotoxic T lymphocytes (CTL) which 
can kill HCV-infected cells that display NS3, NS4, NS5a, NS5b epitopes complexed with 
an MHC class I molecule. HCV-specific CD8 + T cells may also express interferon-y (IFN- 

15 y)- HCV-specific CD8 + T cells can be detected by, for example, 51 Cr release assays. 51 Cr 
release assays measure the ability of HCV-specific CD8 + T cells to lyse target cells 
displaying an nonstructural (e.g., mutant NS) epitope. HCV-specific CD8 + T cells which 
express IFN-y can also be detected by immunological methods, preferably by intracellular 
staining for IFN-y after in vitro stimulation with a mutant NS polypeptide. 

20 HCV-specific CD4 + cells activated by the above-described polypeptides, expressed 

in vivo or in vitro, and combinations of the individual components of these proteins, 
preferably recognize an epitope of a mutant non-structural polypeptide, including an 
epitope of a mutant protein, that is bound to an MHC class II molecule on an HCV-infected 
cell and proliferate in response to stimulating mutant peptides. 

25 HCV-specific CD4 + T cells can be detected by a lymphoproliferation assay. 

Lymphoproliferation assays measure the ability of HCV-specific CD4 + T cells to 
proliferate in response to an epitope. 

Mutant NS (or fusions thereof with core, envelope or other viral polypeptides) can 
be used to activate HCV-specific T cells either in vitro or in vivo. Activation of HCV- 

30 specific T cells can be used, inter alia, to provide model systems to optimize CTL 
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responses to HCV and to provide prophylactic or therapeutic treatment against HCV 
infection. For in vitro activation, proteins are preferably supplied to T cells via a plasmid 
or a viral vector, such as an adenovirus vector, as described above. 

Polyclonal populations of T cells can be derived from the blood, and preferably 

5 from peripheral lymphoid organs, such as lymph nodes, spleen, or thymus, of mammals 
that have been infected with an HCV. Preferred mammals include mice, chimpanzees, 
baboons, and humans. The HCV serves to expand the number of activated HCV-specific T 
cells in the mammal. The HCV-specific T cells derived from the mammal can then be 
restimulated in vitro by adding HCV epitopic peptides to the T cells. The HCV-specific T 

10 cells can then be tested for, inter alia, proliferation (e.g,. lymphoproliferation assays 

known in the art), the production of IFN-y, and the ability to lyse target cells displaying 
HCV NS epitopes in vitro. 

The following examples are meant to illustrate the invention and are not meant to 
1 5 limit it in any way. Those of ordinary skill in the art will recognize modifications within 
the spirit and scope of the invention as set forth herein. 
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EXAMPLES 

Example 1: Constructs 

pCMV-II : pCMV-II (Figure 7, SEQ ID NO:5) was created to contain the human 

5 CMV promoter, enhancer, intron A, polylinker and the bovine growth hormone terminator 
in a deleted-pUC backbone (Life Technologies). 

pT7-HCV : pT7-HCV was created in a polylinker-modified pUC vector to contain 
full-length HCV cDNA preceded by a synthetic T7 promoter. pT7-HCV also contains the 
complete 5 ! UTR and the poly A version of the 3' UTR. 

10 pCMV.ANS35 : To generate pCMV.ANS35 (Figure 5, SEQ ID NO:3), a two step 

procedure was undertaken. First, a PCR product was generated from pT7-HC V that 
corresponded to the following: a 5' EcoRI site, followed by the Kozak sequence of 
ACCATGG; the initiator ATG followed by amino acid #1242 and continuing to the StuI 
site. Second, the StuI to Xbal fragment from a full-length genomic clone was isolated. 

15 The genomic clone consisted of the T7 promoter fused to the full-length HCV cDNA with 
the poly A version of the 3' end, in a pUC vector. Finally, the EcoRI-StuI and Stul-Xbal 
fragments were ligated into the pCMV-II expression vector, transformed into HB101 
competent cells and plated onto ampicillin (100 jag/ml). Miniprep analyses led to the 
identification of the desired clone which was amplified on a larger scale using a Quigen 

20 Gigaprep kit following the manufacturer's specifications. The resulting clone was named 
pCMV.ANS35 (Figure 5, SEQ ID NO:3). 

pd.ANS3NS5 : As shown schematically in Figure 10, the yeast expression plasmid 
pd.ANS3NS5 (SEQ ID NO:8) was constructed using restriction fragments obtained from 
the mammalian expression plasmid pCMV.KM.ANS3 5. pCMV.KM.ANS35 is identical to 

25 pCMV.ANS35 (Figure 5, SEQ ID NO:3) except that it contains a kanamycin resistance 
gene in the viral backbone. pCMV.KM.ANS35 was digested with EcoRI and Nhel to 
obtain 2895bp EcoRI-Nhel fragment. EcoRI-Nhel fragment was ligated into pRSET 
Hindlll-Nhel subcloning vector with oligos (HE) from Hindlll to EcoRI. After sequence 
verification, pRSETHindHI-Nhel #6 was digested with Hindlll and Nhel to obtain a 
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2908bp Hindm-Nhel fragment. 

pCMV.KM.ANS35 was linearized with Xbal and ligated with synthetic oligos (XS) 
from Xbal-Sall. The ligation was digested with Nhel and Sail to obtain 248 lbp Nhel-Sall 
fragment. The fragment was ligated into pET3a Nhel-Sall subcloning vector. After 
5 sequence verification, pET3a Nhel-Sall #2 was digested with Nhel and Sail to obtain a 
248 lbp Nhel-Sall fragment. BamHI-HindHI ADH2/GAPDH promoter fragment was then 
ligated with Hindlll-Nhel and Nhel-Sall fragments into pBS24.1 BamHI-Sall yeast 
expression vector. 

pd.ANS3NS5.PJ: pd.ANS3NS5.PJ (Figures 13 and 14; SEQ ID NO:10) was 

10 generated to create a "perfect junction" at the 5' and 3' end of the HCV coding region. At 
the 5' end of pd.ANS3NS5, there were 6 extra bases between the yeast ADH2/GAPDH 
promoter and the ATG of the polypeptide. At the 3' end, there were 52 bases of 
untranslated sequence between the stop codon of the polypeptide and the cc-factor 
terminator in the yeast expression vector. pd.ANS3NS5.PJ was created by digesting 

1 5 pd. ANS3NS5 #17 with Seal and SphI to obtain 4963bp Scal-SphI fragment. pd.NS5b301 1 
was digested with SphI and Sail to obtain a 32 lbp Sphl-Sall fragment which gave the 
"perfect junction" at the 3' end of the polypeptide. The Scal-SphI and Sphl-Sall fragments 
were ligated into pSP72 HindTII-Sall subcloning vector with synthetic oligos from Hindffl- 
ScaI(HS) for the "perfect junction" at the 5' end. 

20 The region of synthetic sequence in pSP72 Hindlll-Sall clone# 6 was verified. 

pSP72 Hindlll-Sall clone#6 was digested with Hindin and Blnl or with Blnl and Sail to 
obtain 2441bp Hindlll-Blnl and 2895bp Blnl-Sall fragments, respectively. The BamHI- 
Hindlll ADH2/GAPDH promoter fragment was ligated to Hindlll-Blnl and Blnl-Sall 
fragments into pBS24.1 BamHI-Sall yeast expression vector. 

25 pd.ANS3NS5.PJ.core!21RT and pd.A NS3NS5.PJ.corel73RT were generated and 

encode HCV core aa 1-121 at the C-terminus of the ANS3NS5 polypeptide (designated 
pd.ANS3NS5.PJ.corel21RT, SEQ ID NO:12) and core aa 1-173 at the C-terminus of the 
ANS3NS5 polypeptide (designated pd.ANS3NS5.PJ.corel73RT, SEQ ID NO: 14). The 
core sequence had aa 9 mutated from Lys to Arg and aa 1 1 mutated from Asn to Thr, 
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designated as core 121RT or 173RT. 

pd.ANS3NS5.PJ.core!21RT and pd.ANS3NS5.PJ.core!73RT : To generate 
pd.ANS3NS5.PJ.corel21RT (Figure 17, SEQIDNO:12) andpd.ANS3NS5.PJ.corel73RT 
(Figure 18, SEQ ID NO:14). As shown in Figure 16, a Notl-Sal HCVcorel21RT and 
5 HCVcorel73RT were amplified by PCR, from an E. coli expression plasmid, 

pSODCF2.HCVcorel91RT #2. Either the core 121RT Not-Sall PCR product or the core 
173RT Not-Sall PCR product were ligated into a pT7Blue2 Pstl-Sall subcloning vector 
with synthetic oligos (PN) from PstI to Notl. After sequence confirmation, 
pT7Blue2corel21RT clone#9 and pT7Blue2corel73RT clone#l 1 was digested with PstI 
10 and Sail to obtain 403bp and 559bp Pstl-Sall fragments, respectively, for further cloning. 

A 121bp Notl-PstI fragment from pSP72 Hindlll-Sall clone #6 was isolated as 
described above during the cloning of pd.ANS3NS5.PJ. Notl-PstI and Pstl-Sall fragments 
were assembled into a vector made by digesting pd.NS3NS5.PJ clone#5 (described above) 
with Notl and Sail. 

15 ANS3NS5 and Core 140 and Core 150 : An HCV core epitope was found which 

elicits CTLs in baboons (HCV core aa 121-135). Since pd.ANS3NS5.PJ.corel21RT ends 
right before this potentially important epitope and was expressed better than the longer 
pd.ANS3NS5.PJ.corel73RT construct (Example 2), two intermediate constructs were 
made which include this epitope, possibly giving intermediate expression levels. The two 

20 new constructs fused HCV core aa 1-140 or HCV core aal-150 to the C terminus of 
ANS3NS5.PJ. 

pd.ANS3NS5.PJ.core!40RT (Figure 21. SEP ID NO:16^ and 
pd.ANS3NS5.PJ.corel50RT (Figure 22, SEQ ID NO: 18): As shown in Figure 20, a Pstl- 
Sall HCVcorel40RT and a PstI-SalIHCVcorel50RT fragment were amplified by PCR 
25 from pd.ANS3NS5.PJ.corel73RT clone #16. Ligate either HCV core Pstl-Sall PCR 
products into pT7Blue2 Pstl-Sall subcloning vector. After sequence confirmation, 
pT7Blue2corel40RT clone#22 and pT7Blue2corel50RT clone#26 were digested with 
Pstl-Sall to obtain 460bp and 490bp Pstl-Sall fragments, respectively, for further cloning. 
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A 121bp Notl-PstI fragment was isolated from pSP72 Hindlll-Sall clone #6 (as 
described above during the cloning of pd.ANS3NS5.PJ. Notl-PstI and Pstl-Sall fragments 
were assembled into a vector made by digesting pd.ANS3NS5.PJ clone#5 (described 
above) with NotI and Sail. 

5 

Example 2: Protein Expression 

Various of the constructs described herein, encoding HCV-1 ANS3 to NS5 antigen 
(aa 1242-301 1), were expressed in yeast. S. cerevisiae strain AD3 was transformed with 
pd.ANS3NS5 and checked for expression. A stained protein band at the expected 

10 molecular weight of 194 kD was not observed (Figure 12). Strain AD3 was also 

transformed with pd.ANS3NS5.PJ clone #5 and checked for expression. A protein band of 
the expected molecular weight of 194kD was detected (Figure 15). Strain AD3 was 

transformed with pd.ANS3NS5.PJ.corel21RT clone #6 and pd.ANS3NS5.PJ.corel73RT 
clone#15 and checked for expression. Protein bands of the expected molecular weight of 

15 206kD and 210kD, respectively, were observed. Expression levels of the 
pd.ANS3NS5.PJ.corel73RT construct were much less than that of the 
pd.ANS3NS5.PJ.corel21RT construct. (See Figurel9). Thus, there is a correlation of 
protein expression levels and the length of HCV core. 

Strain AD3 were transformed with pd.ANS3NS5.PJ.corel40RT clone# 29 and 

20 pd.ANS3NS5.PJ.corel50RT clone#35 and checked for expression. Bands of the expected 
molecular weights of 208kD and 209kD were seen by stain at levels close to those of 
pd.ANS3NS5corel73RT (Figure 23). 

Example 3: Eliciting Immune Responses 

25 A. Immunization 

To evaluate the immunogenicity of the mutant NS polypeptides, studies using 
guinea pigs, rabbits, mice, rhesus macaques and/or baboons are performed. The studies are 
structured as follows: DNA immunization alone (single or multiple); DNA immunization 
followed by protein immunization (boost); DNA immunization followed by protein 
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immunization; immunization by PLG particles. Immunization is intramuscular or 
mucosally. 

B. Humoral Immune Response 
5 The humoral immune response is checked in serum specimens from immunized 

animals with anti-NS antibody ELIS As (enzyme-linked immunosorbent assays) at various 
times post-immunization. Briefly, serum from immunized animals is screened for 
antibodies directed against the NS or mutant NS proteins. Wells of ELIS A microtiter 
plates are coated overnight with the selected HCV protein and washed four times; 

10 subsequently, blocking is done with PBS-0.2% Tween (Sigma). After removal of the 
blocking solution, diluted mouse serum is added. Sera are tested at various dilutions. 
Microtiter plates are washed and incubated with a secondary, peroxidase-coupled anti- 
mouse IgG antibody (Pierce, Rockford, IL). ELIS A plates are washed and 3, 3', 5, 5 1 - 
tetramethyl benzidine (TMB; Pierce) is added per well. The optical density of each well is 

15 measured. Titers are typically reported as the reciprocal of the dilution of serum that gave 
a half-maximum optical density (O.D.). Similarly, generation of neutralization of binding 
(NOB) antibodies can be measured by methods known in the art. 

C Cellular Immune Response 
20 The frequency of specific cytotoxic T-lymphocytes (CTL) is evaluated by a 

standard chromium release assay of peptide pulsed Balb/c mouse CD4 cells. Briefly, 
spleen cells (Effector cells, E) are obtained from the BALB/c mice immunized, cultured, 
restimulated, and assayed for CTL activity against HCV peptide-pulsed target cells. 
Cytotoxic activity is measured in a standard 5l Cr release assay. 

25 

Example 4: Immunization with PLG-delivered DNA. 

The polylactide-co-glycolide (PLG) polymers are obtained from Boehringer 
Ingelheim, U.S.A. The PLG polymer is RG505, which has a copolymer ratio of 50/50 and 
a molecular weight of 65 kDa (manufacturers data). Cationic microparticles with adsorbed 
30 DNA are prepared using a modified solvent evaporation process, essentially as described in 
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Singh et al, Proc. Natl. Acad. ScL USA (2000) 97:811-816. Briefly, the microparticles are 
prepared by emulsifying a 5% w/v polymer solution in methylene chloride with PBS at 
high speed using an IKA homogenizes The primary emulsion is then added to distilled 
water containing cetyl trimethyl ammonium bromide (CTAB) (0.5% w/v). This results in 

5 the formation of a w/o/w emulsion which was stirred at room temperature, allowing the 

methylene chloride to evaporate. The resulting microparticles are washed in distilled water 
by centrifugation and freeze dried. Following preparation, washing and collection, DNA is 
adsorbed onto the microparticles by incubating cationic microparticles in a solution of 
DNA. The microparticles are then separated by centrifugation, the pellet washed with TE 

10 buffer and the microparticles are freeze dried, resuspended and administered to animals. 
Antibody titers are measured by ELISA assays. 

All patents, patent applications, and other publications mentioned herein, are 
hereby incorporated herein by reference in their entireties. 
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What is claimed is: 

1 . An isolated mutant non-structural ("NS") HCV polypeptide comprising a 
polypeptide having a mutation in the catalytic domain of NS3, wherein said mutation 
functionally disrupts the catalytic domain. 

5 

2. The polypeptide of claim 1, wherein the mutation comprises a deletion. 

3. The polypeptide of claim 1, wherein the mutation comprises a substitution. 

10 4. The polypeptide of claim 1, wherein said NS polypeptide comprises NS3, 

NS4andNS5. 

5. The polypeptide of claim 1, wherein said NS polypeptide consists of NS3, 
NS4 andNSS. 



15 



20 



25 



6. The polypeptide of claim 1, wherein said NS polypeptide consists of NS3 
andNSS. 

7. The polypeptide of claim 6 ? wherein NS5 consists of NS5a. 

8. The polypeptide of claim 6, wherein NS5 consists of NS5b. 

9. The polypeptide of claim 1 , wherein said NS polypeptide consists of NS3 
andNS4. 

10. The polypeptide of claim 9, wherein NS4 consists of NS4a. 

1 1 . The polypeptide of claim 9 ? wherein NS4 consists of NS4b. 
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12. The polypeptide of claim 4, further comprising a second viral polypeptide 
that is not NS3, NS4, or NS5 of HCV. 

13. The polypeptide of claim 12, wherein the second viral polypeptide 
5 comprises an HCV Core polypeptide ("C"), or fragment thereof. 

14. The polypeptide of claim 13, wherein the C polypeptide is truncated. 

15. The polypeptide of claim 14, wherein the truncation is at amino acid 121. 

10 

16. The polypeptide of claim 12, wherein the polypeptide further comprises an 
HCV envelope protein ("E"). 

17. The polypeptide of claim 16, wherein the E is El. 

15 

18. The polypeptide of claim 16, wherein the E is E2. 

19. A composition comprising 
(a) the polypeptide of claim 1; and 

20 (b) a pharmaceutically acceptable excipient 

20. An isolated and purified polynucleotide which encodes the mutant HCV 
polypeptide according to claim 1. 



25 21 . A composition comprising 

(a) the isolated purified polynucleotide of claim 20; and 

(b) a pharmaceutically acceptable excipient. 

22. The composition of claim 21 , wherein the polynucleotide is DNA. 

30 
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23. 
24. 

5 25. 

26. 
27. 

10 

28. 
29. 

15 30. 

31. 



The composition of claim 21, wherein the polynucleotide is in a plasmid. 

An expression vector comprising the polynucleotide of claim 20. 

An expression vector comprising the polynucleotide of SEQ ID NO:8. 

A host cell comprising the polynucleotide of claim 20. 

The host cell of claim 26, wherein the cell is a yeast cell. 

The host cell of claim 26, wherein the cell is a mammalian cell 

The host cell of claim 26, wherein the cell is an insect cell. 

The host cell of claim 26, wherein the cell is a plant cell. 

The host cell of claim 26, wherein the polynucleotide comprises the 



The polypeptide of claim 1, wherein the polypeptide further comprises SEQ 

A method of preparing a mutant NS HCV polypeptide, wherein the method 
steps of: 

a. transforming a host cell with an expression vector according to 
claim 24, under conditions wherein the polypeptide is expressed; 



sequence of SEQ ID NO:8. 



and 



b. 



isolating the polypeptide. 
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34. The method of claim 33, wherein the host cell is a yeast cell. 

35. The method of claim 33, wherein the host cell is a mammalian cell. 

36. The method of claim 33, wherein the host cell is an insect cell. 

37. The method of claim 33, wherein the host cell is a plant cell. 

38. An antibody that specifically binds to a polypeptide of claim 1 . 

39. The antibody of claim 38, wherein the antibody is a monoclonal antibody. 

40. The antibody of claim 38, wherein the antibody is a purified polyclonal 
antibody. 

41 . A method of eliciting an immune response in a subject, comprising the step 
of administering to the subject a polypeptide of claim 1. 



42. A method of eliciting an immune response in a subject, comprising the step 
20 of administering to the subject a polynucleotide of claim 20. 



10 



15 
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ABSTRACT 



Polypeptides comprising a mutant non-structural Hepatitis C virus useful in 
diagnostic and/or immunogenic compositions are disclosed, in which the mutant is an N- 
terminal mutation that functionally disrupt the catalytic domain of NS3. Polynucleotides 
encoding these polypeptides, host cells transformed with polynucleotides and methods of 
using the polypeptides and polynucleotides are also disclosed. 
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FIGURE 2 
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FIGURE 3 - Page 1 



1 TCGCGCGTTT CGGTGATGAC GGTGAAAACC TCTGACACAT GCAGCTCCCG GAGACGGTCA CAGCTTGTCT GTAAGCGGAT 
AGCGCGCAAA GCCACTACTG CCACTTTTGG AGACTGTGTA CGTCGAGGGC CTCTGCCAGT GTCGAACAGA CATTCGCCTA 



81 GCCGGGAGCA GACAAGCCCG TCAGGGCGCG TCAGCGGGTG TTGGCGGGTG TCGGGGCTGG CTTAACTATG CGGCATCAGA 
CGGCCCTCGT CTGTTCGGGC AGTCCCGCGC AGTCGCCCAC AACCGCCCAC AGCCCCGACC GAATTGATAC GCCGTAGTCT 



StuI 



161 


GCAGATTGTA 
CGTCTAACAT 


CTGAGAGTGC 
GACTCTCACG 


ACCATATGAA 
TGGTATACTT 


GCTTTTTGCA 
CGAAAAACGT 


AAAGCCTAGG 
TTTCGGATCC 


CCTCCAAAAA 
GGAGGTTTTT 


AGCCTCCTCA 
TCGGAGGAGT 


CTACTTCTGG 
GATGAAGACC 


241 


AATAGCTCAG 
TTATCGAGTC 


AGGCCGAGGC 
T.CCGGCTCCG 


GGCCTCGGCC 
CCGGAGCCGG 


TCTGCATAAA 
AGACGTATTT 


TAAAAAAAAT 
ATTTTTTTTA 


TAGTCAGCCA 
ATCAGTCGGT 


TGGGGCGGAG 
kCCCCGCCTC 


AATGGGCGGA 
TTACCCGCCT 


321 


ACTGGGCGGG 
TGACCCGCCC 


GAGGGAATTA 
CTCCCTTAAT 


TTGGCTATTG 
AACCGATAAC 


GCCATTGCAT 
CGGTAACGTA 


ACGTTGTATC 
TGCAACATAG 


TATATCATAA 
ATATAGTATT 


TATGTACATT 
ATACATGTAA 


TATATTGGCT 
ATATAACCGA 


401 


CATGTCCAAT 
GTACAGGTTA 


ATGACCGCCA 
TACTGGCGGT 


TGTTGACATT 
ACAACTGTAA 


GATTATTGAC 
CTAATAACTG 


TAGTTATTAA 
ATCAATAATT 


TAGTAATCAA 
ATCATTAGTT 


TTACGGGGTC 
AATGCCCCAG 


ATTAGTTCAT 
TAATCAAGTA 


481 


AGCCCATATA 
TCGGGTATAT 


TGGAGTTCCG 
ACCTCAAGGC 


CGTTACATAA 
GCAATGTATT 


CTTACGGTAA 
GAATGCCATT 


ATGGCCCGCC 
TACCGGGCGG 


TGGCTGACCG 
ACCGACTGGC 


CCCAACGACC 
GGGTTGCTGG 


CCCGCCCATT 
GGGCGGGTAA 


561 


GACGTCAATA 
CTGCAGTTAT 


ATGACGTATG 
TACTGCATAC 


TTCCCATAGT 
AAGGGTATCA 


AACGCCAATA 
TTGCGGTTAT 


GGGACTTTCC 
CCCTGAAAGG 


ATTGACGTCA 
TAACTGCAGT 


ATGGGTGGAG 
TACCCACCTC 


TATTTACGGT 
ATAAATGCCA 


641 


AAACTGCCCA 
TTTGACGGGT 


CTTGGCAGTA 
GAACCGTCAT 


CATCAAGTGT 
GTAGTTCACA 


ATCATAXGCC 
TAGTATACGG 


AAGTCCGCCC 
TTCAGGCGGG 


CCTATTGACG 
GGATAACTGC 


TCAATGACGG 
AGTTACTGCC 


TAAATGGCCC 
ATTTACCGGG 


721 


GCCTGGCATT 
CGGACCGTAA 


ATGCCCAGTA 
TACGGGTCAT 


CATGACCTTA 
GTACTGGAAT 


CGGGACTTTC 
GCCCTGAAAG 


CTACTTGGCA 
GATGAACCGT 


GTACATCTAC 
CATGTAGATG 


GTATTAGTCA 
CATAATCAGT 


TCGCTA-" r AC 
AGCGATAATG 


801 


CATGGTGATG 
GTACCACTAC 


CGGTTTTGGC 
GCCAAAACCG 


AGTACACCAA 
TCATGTGGTT 


TGGGCGTGGA 
ACCCGCACCT 


TAGCGGTTTG 
ATCGCCAAAC 


ACTCACGGGG 
TGAGTGCCCC 


ATTTCCAAGT 
TAAAGGTTCA 


CTCCACCCCA 
GAGGTGGGGT 


881 


TTGACGTCAA 
AACTGCAGTT 


TGGGAGTTTG 
ACCCTCAAAC 


TTTTGGCACC 
AAAACCGTGG 


AAAATCAACG 
TTTTAGTTGC 


GGACTTTCCA 
CCTGAAAGGT 


AAATGTCGTA 
TTTACAGCAT 


ATAACCCCGC 
TATTGGGGCG 


CCCGTTGACG 
GGGCAACTGC 


961 


CAAATGGGCG 
GTTTACCCGC 


GTAGGCGTGT 
CATCCGCACA 


ACGGTGGGAG 
TGCCACCCTC 


GTCTATATAA 
CAGATATATT 


GCAGAGCTCG 
CGTCTCGAGC 


TTTAGTGAAC 
AAATCACTTG 


CGTCAGATCG 
GCAGTCTAGC 


CCTGGAGACG 
GGACCTCTGC 


1041 


CCATCCACGC 
GGTAGGTGCG 


TGTTTTGACC 
ACAAAACTGG 


TCCATAGAAG 
AGGTATCTTC 


ACACCGGGAC 
TGTGGCCCTG 


CGATCCAGCC 
GCTAGGTCGG 


TCCGCGGCCG 
AGGCGCCGGC 


GGAACGGTGC 
CCTTGCCACG 


ATTGGAACGC 
TAACCTTGCG 


1121 


GGATTCCCCG 
CCTAAGGGGC 


TGCCAAGAGT 
ACGGTTCTCA 


GACGTAAGTA 
CTGCATTCAT 


CCGCCTATAG 
GGCGGATATC 


ACTCTATAGG 
TGAGATATCC 


CACACCCCTT 
GTGTGGGGAA 


TGGCTCTTAT 
ACCGAGAATA 


GCATGCTATA 
CGTACGATAT 


1201 


CTGTTTTTGG 
GACAAAAACC 


CTTGGGGCCT 
GAACCCCGGA 


ATACACCCCC 
TATGTGGGGG 


GCTCCTTATG 
CGAGGAATAC 


CTATAGGTGA 
GATATCCACT 


TGGTATAGCT 
ACCATATCGA 


TAGCCTATAG 
ATCGGATATC 


GTGTGGGTTA 
CACACCCAAT 



1281 TTGACCATTA TTGACCACTC CCCTATTGGT GACGATACTT TCCATTACTA ATCCATAACA TGGCTCTTTG CCACAACTAT 
AACTGGTAAT AACTGGTGAG GGGATAACCA CTGCTATGAA AGGTAATGAT TAGGTATTGT ACCGAGAAAC GGTGTTGATA 



1361 CTCTATTGGC TATATGCCAA TACTCTGTCC TTCAGAGACT GACACGGACT CTGTATTTTT ACAGGATGGG GTCCATTTAT 
GAGATAACCG ATATACGGTT ATGAGACAGG AAGTCTCTGA CTGTGCCTGA GACATAAAAA TGTCCTACCC CAGGTAAATA 
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1441 TATTTACAAA TTCACATATA CAACAACGCC GTCCCCCGTG CCCGCAGTTT TTATTAAACA TAGCGTGGGA TCTCCGACAT 
ATAAATGTTT AAGTGTATAT GTTGTTGCGG CAGGGGGCAC GGGCGTCAAA AATAATTTGT ATCGCACCCT AGAGGCTGTA 



1521 CTCGGGTACG TGTTCCGGAC ATGGGCTCTT CTCCGGTAGC GGCGGAGCTT CCACATCCGA GCCCTGGTCC CATCCGTCCA 
GAGCCCATGC ACAAGGCCTG TACCCGAGAA GAGGCCATCG CCGCCTCGAA GGTGTAGGCT CGGGACCAGG GTAGGCAGGT 



1601 GCGGCTCATG GTCGCTCGGC AGCTCCTTGC TCCTAACAGT GGAGGCCAGA CTTAGGCACA GCACAATGCC CACCACCACC 
CGCCGAGTAC CAGCGAGCCG TCGAGGAACG AGGATTGTCA CCTCCGGTCT GAATCCGTGT CGTGTTACGG GTGGTGGTGG 



1681 AGTGTGCCGC ACAAGGCCGT GGCGGTAGGG TATGTGTCTG AAAATGAGCT CGGAGATTGG GCTCGCACCT GGACGCAGAT 
TCACACGGCG TGTTCCGGCA CCGCCATCCC ATACACAGAC TTTTACTCGA GCCTCTAACC CGAGCGTGGA CCTGCGTCTA 



17 61 GGAAGACTTA AGGCAGCGGC AGAAGAAGAT GCAGGCAGCT GAGTTGTTGT ATTCTGATAA GAGTCAGAGG TAACTCCCGT 
CCTTCTGAAT TCCGTCGCCG TCTTCTTCTA CGTCCGTCGA CTCAACAACA TAAGACTATT CTCAGTCTCC ATTGAGGGCA 



1841 TGCGGTGCTG TTAACGGTGG AGGGCAGTGT AGTCTGAGCA GTACTCGTTG CTGCCGCGCG CGCCACCAGA CATAATAGCT 
ACGCCACGAC AATTGCCACC TCCCGTCACA TCAGACTCGT CATGAGCAAC GACGGCGCGC GCGGTGGTCT GTATTATCGA 



+2 M A A 

EcoRI 

1921 GACAGACTAA CAGACTGTTC CTTTCCATGG GTCTTTTCTG CAGTCACCGT CGTCGACCTA AGAATTCACC ATGGCTGCAT 
CTGTCTGATT GTCTGACAAG GAAAGGTACC CAGAAAAGAC GTCAGTGGCA GCAGCTGGAT TCTTAAGTGG TACCGACGTA 



+2 Y A A Q GYK VLVL NPS V A A TLGF GAY MSK 
2001 ATGCAGCTCA GGGCTATAAG GTGCTAGTAC TCAACCCCTC TGTTGCTGCA ACACTGGGCT TTGGTGCTTA CATGTCCAAG 
TACGTCGAGT CCCGATATTC CACGATCATG AGTTGGGGAG ACAACGACGT TGTGACCCGA AACCACGAAT GTACAGGTTC 



+ 2AHGI DPN IRT GVRT ITT GSP ITYS TYG 
2081 GCTCATGGGA TCGATCCTAA CATCAGGACC GGGGTGAGAA CAATTACCAC TGGCAGCCCC ATCACGTACT CCACCTACGG 
CGAGTACCCT AGCTAGGATT GTAGTCCTGG CCCCACTCTT GTTAATGGTG ACCGTCGGGG TAGTGCATGA GGTGGATGCC 



+2 KFL ADGG CSG GAY Dili CDE CHS TDA 
2161 CAAGTTCCTT GCCGACGGCG GGTGCTCGGG GGGCGCTTAT GACATAATAA TTTGTGACGA GTGCCACTCC ACGGATGCCA 
GTTCAAGGAA CGGCTGCCGC CCACGAGCCC CCCGCGAATA CTGTATTATT AAACACTGCT CACGGTGAGG TGCCTACGGT 



+ 2TSIL GIG TVLD QAE TAG ARLV V L A TAT 
2241 CATCCATCTT GGGCATTGGC ACTGTCCTTG ACCAAGCAGA GACTGCGGGG GCGAGACTGG TTGTGCTCGC CACCGCCACC 
GTAGGTAGAA CCCGTAACCG TGACAGGAAC TGGTTCGTCT CTGACGCCCC CGCTCTGACC AACACGAGCG GTGGCGGTGG 



+ 2 ? ? G S VTV PHP NIEE V A L STT GEIP FYG 
2321 CCTCCGGGCT CCGTCACTGT GCCCCATCCC AACATCGAGG AGGTTGCTCT GTCCACCACC GGAGAGATCC CTTTTTACGG 
GGAGGCCCGA GGCAGTGACA CGGGGTAGGG TTGTAGCTCC TCCAACGAGA CAGGTGGTGG CCTCTCTAGG GAAAAATGCC 



+ 2 K A I PLEV I K G GRH LIFC HSK KKC DEL 
2 401 CAAGGCTATC CCCCTCGAAG TAATCAAGGG GGGGAGACAT CTCATCTTCT GTCATTCAAA GAAGAAGTGC GACGAACTCG 
GTTCCGATAG GGGGAGCTTC ATTAGTTCCC CCCCTCTGTA GAGTAGAAGA CAGTAAGTTT CTTGTTCACG CTGCTTGAGC 



+ 2AAKL V A L GINA V A Y YRG LDVS VIP TSG 
24 81 CCGCAAAGCT GGTCGCATTG GGCATCAATG CCGTGGCCTA CTACCGCGGT CTTGACGTGT CCGTCATCCC GkCCkGCGGC 
GGCGTTTCGA CCAGCGTAAC CCGTAGTTAC GGCACCGGAT GATGGCGCCA GAACTGCACA GGCAGTAGGG CTGGTCGCCG 



+2DVVV VAT DAL MTGY TGD FDS VIDC NTC 
2561 GATGTTGTCG TCGTGGCAAC CGATGCCCTC ATGACCGGCT ATACCGGCGA CTTCGACTCG GTGATAGACT. GCAATACGTG 
CTACAACAGC AGCACCGTTG GCTACGGGAG TACTGGCCGA TATGGCCGCT GAAGCTGAGC CACTATCTGA CGTTATGCAC 
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+ 2 VTQ TVDF SLD P T F T I E T I T L PQD A V S 
2641 . TGTCACCCAG ACAGTCGATT TCAGCCTTGA CCCTACCTTC ACCATTGAGA CAATCACGCT CCCCCAAGAT GCTGTCTCCC 
ACAGTGGGTC TGTCAGCTAA AGTCGGAACT GGG AT GGAAG TGGTAACTCT GTTAGTGCGA GGGGGTTCTA CGACAGAGGG 



+ 2RTQR-RGR TGRG KPG IYR F V A P GER PSG 
27 21 GCACTCAACG TCGGGGCAGG ACTGGCAGGG GGAAGCCAGG CATCTACAGA TTTGTGGCAC CGGGGGAGCG CCCCTCCGGC 
CGTGAGTTGC AGCCCCGTCC TGACCGTCCC CCTTCGGTCC GTAGATGTCT AAACACCGTG GCCCCCTCGC GGGGAGGCCG 



+ 2MFDS SVL CEC YDAG CAW Y E L TPAE TTV 
2801 ATGTTCGACT CGTCCGTCCT CTGTGAGTGC TATGACGCAG GCTGTGCTTG GTATGAGCTC ACGCCCGCCG AGACTACAGT 
TACAAGCTGA GCAGGCAGGA GACACTCACG ATACTGCGTC CGACACGAAC CATACTCGAG HGCGGGCGGC TCTGATGTCA 



+2 RLR A Y M N TPG LPV CQDH LEF WEG VFT 

StuI 

2881 TAGGCTACGA GCGTACATGA ACACCCCGGG GCTTCCCGTG TGCCAGGACC ATCTTGAATT TTGGGAGGGC GTCTTTACAG 
ATCCGATGCT CGCATGTACT TGTGGGGCCC CGAAGGGCAC ACGGTCCTGG TAGAACTTAA AACCCTCCCG CAGAAATGTC 



+ 2GLTH IDA HFLS QTK QSG ENLP YLV AYQ 
StuI 

2961 GCCTCACTCA TATAGATGCC CACTTTCTAT CCCAGACAAA GCAGAGTGGG GAGAACCTTC CTTACCTGGT AGCGTACCAA 
CGGAGTGAGT ATATCTACGG GTGAAAGATA GGGTCTGTTT CGTCTCACCC CTCTTGGAAG GAATGGACCA TCGCATGGTT 



+ 2ATVC A R A QAP PPSW DOM WKC LIRL KPT 
3041 GCCACCGTGT GCGCTAGGGC TCAAGCCCCT CCCCCATCGT GGGACCAGAT GTGGAAGTGT TTGATTCGCC TCAAGCCCAC 
CGGTGGCACA CGCGATCCCG AGTTCGGGGA GGGGGTAGCA CCCTGGTCTA CACCTTCACA AACTAAGCGG AGTTCGGGTG 



+2 LHG PTPL LYR LGA VQNE I T L THP V T K 
3121 CCTCCATGGG CCAACACCCC TGCTATACAG ACTGGGCGCT GTTCAGAATG AAATCACCCT GACGCACCCA GTCACCAAAT 
GGAGGTACCC GGTTGTGGGG ACGATATGTC TGACCCGCGA CAAGTCTTAC TTTAGTGGGA CTGCGTGGGT CAGTGGTTTA 



+ 2YIMT CMS ADLE V V T STW VLVG GVL A A L 
3201 ACATCATGAC ATGCATGTCG GCCGACCTGG AGGTCGTCAC GAGCACCTGG GTGCTCGTTG GCGGCGTCCT GGCTGCTTTG 
TGTAGTACTG TACGTACAGC CGGCTGGACC TCCAGCAGTG CTCGTGGACC CACGAGCAAC CGCCGCAGGA CCGACGAAAC 



+2AAYC LST GCV VIVG RVV LSG KPAI IPD 
3281 GCCGCGTATT GCCTGTCAAC AGGCTGCGTG GTCATAGTGG GCAGGGTCGT CTTGTCCGGG AAGCCGGCAA TCATACCTGA 
CGGCGCATAA CGGACAGTTG TCCGACGCAC CAGTATCACC CGTCCCAGCA GAACAGGCCC TTCGGCCGTT AGTATGGACT 



+-2 REV LYRE FDE MEE CSQH LPY IEQ GMM 
3361 CAGGGAAGTC CTCTACCGAG AGTTCGATGA GAT GGAAG AG TGCTCTCAGC ACTTACCGTA CATCGAGCAA GGGATGATGC 
GTCCCTTCAG GAGATGGCTC TCAAGCTACT CTACCTTCTC ACGAGAGTCG TGAATGGCAT GTAGCTCGTT CCCTACTACG 



+2LAEQ FKQ KALG L L Q TAS RQAE VIA P A V 
3441 TCGCCGAGCA GTTCAAGCAG AAGGCCCTCG GCCTCCTGCA GACCGCGTCC CGTCAGGCAG AGGTTATCGC CCCTGCTG7C 
AGCGGCTCG? CAAGTTCGTC TTCCGGGAGC CGGAGGACGT CTGGCGCAGG GCAGTCCGTC TCCAATAGCG GGG AC G AC AG 



+ 2QTNW QKL ETF WAKH MWN FIS GIQY LAG 
3521 CAGACCAACT GGCAAAAACT CGAGACCTTC TGGGCGAAGC ATATGTGGAA CTTCATCAGT GGGATACAAT ACTTGGCGGG 
GTCTGGTTGA CCGTTTTTGA GCTCTGGAAG ACCCGCTTCG TATACACCTT GAAGTAGTCA CCCTATGTTA TGAACCGCCC 



+2 LST LPGN PAI ASL MAFT AAV TSP LTT 
3601 CTTGTCAACG CTGCCTGGTA ACCCCGCCAT TGCTTCATTG ATGGCTTTTA CAGCTGCTGT CACCAGCCCA CTAACCACTA 
GAACAGTTGC GACGGACCAT TGGGGCGGTA ACGAAGTAAC TACCGAAAAT GTCGACGACA GTGGTCGGGT GATTGGTGAT 



+2SQTL LFN ILGG WVA AQL AAPG AAT AFV 
3681 GCCAAACCCT CCTCTTCAAC ATATTGGGGG GGTGGGTGGC TGCCCAGCTC GCCGCCCCCG GTGCCGCTAC TGCCTTTGTG 
CGGTTTGGGA GGAGAAGTTG TATAACCCCC CCACCCACCG ACGGGTCGAG CGGCGGGGGC CACGGCGATG ACGGAAACAC 
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+ 2 G A G L A G A A I G SVGL G K V LID ILAG Y G A 
3761 GGCGCTGGCT TAGCTGGCGC CGCCATCGGC AGTGTTGGAC TGGGGAAGGT CCTCATAGAC ATCCTTGCAG GGTATGGCGC 
CCGCGACCGA ATCGACCGCG GCGGTAGCCG TCACAACCTG ACCCCTTCCA GGAGTATCTG TAGGAACGTC CCATACCGCG 



+ 2 G V A $ A L V AFK IMS GEVP STE DLV NLL 
38 41 GGGCGTGGCG GGAGCTCTTG TGGCATTCAA GATCATGAGC GGTGAGGTCC CCTCCACGGA GGACCTGGTC AATCTACTGC 
CCCGCACCGC CCTCGAGAAC ACCGTAAGTT CTAGTACTCG CCACTCCAGG GGAGGTGCCT CCTGGACCAG TTAGATGACG 



+ 2 PAIL SPG ALVV GVV CAA ILRR HVG PGE 
3921 CCGCCATCCT CTCGCCCGGA GCCCTCGTAG TCGGCGTGGT CTGTGCAGCA ATACTGCGCC GGCACGTTGG CCCGGGCGAG 
GGCGGTAGGA GAGCGGGCCT CGGGAGCATC AGCCGCACCA GACACGTCGT TATGACGCGG CCGTGCAACC GGGCCCGCTC 



+2GAVQ.WMN RLI AFAS RGN HVS PTHY VPE 
4001 GGGGCAGTGC AGTGGATGAA CCGGCTGATA GCCTTCGCCT CCCGGGGGAA CCATGTTTCC CCCACGCACT ACGTGCCGGA 
CCCCGTCACG TCACCTACXT GGCCG AC TAT CGGAAGCGGA GGGCCCCCTT GGTACAAAGG GGGTGCGTGA TGCACGGCCT 



+ 2 SDA A A R V T A I LSS LTVT QLL RRL HQW 
4 081 GAGCGATGCA GCTGCCCGCG TCACTGCCAT ACTCAGCAGC CTCACTGTAA CCCAGCTCCT GAGGCGACTG CACCAGTGGA 
CTCGCTACGT CGACGGGCGC AGTGACGGTA TGAGTCGTCG GAGTGACATT GGGTCGAGGA CTCCGCTGAC GTGGTCACCT 



^2 I S S E CTT ? C S G SWL RDI WDWI CEV LSD 
4161 TAAGCTCGGA GTGTACCACT CCATGCTCCG GTTCCTGGCT AAGGGACATC TGGGACTGGA TATGCGAGGT GTTGAGCGAC 
ATTCGAGCCT CACATGGTGA GGTACGAGGC CAAGGACCGA TTCCCTGTAG ACCCTGACCT ATACGCTCCA CAACTCGCTG 



+ 2FKTW LKA KLM PQLP GIP FVS CQRG YKG 

BamHI 

4241 TTTAAGACCT GGCTAAAAGC TAAGCTCATG CCACAGCTGC CTGGGATCCC CTTTGTGTCC TGCCAGCGCG GGTATAAGGG 
AAATTCTGGA CCGATTTTCG ATTCGAGTAC GGTGTCGACG GACCCTAGGG GAAACACAGG ACGGTCGCGC CCATATTCCC 



+ 2 VWR GDGI MHT RCH CGAE ITG HVK NGT 
4321 GGTCTGGCGA GGGGACGGCA TCATGCACAC TCGCTGCCAC TGTGGAGCTG AGATCACTGG ACATGTCAAA AACGGGACGA 
CCAGACCGCT CCCCTGCCGT AGTACGTGTG AGCGACGGTG ACACCTCGAC TCTAGTGACC TGTACAGTTT TTGCCCTGCT 



^2 M R I V GPR TCRN MWS GTF P I N A YTT GPC 
44 01 TGAGGATCGT CGGTCCTAGG ACCTGCAGGA ACATGTGGAG TGGGACCTTC CCCATTAATG CCTACACCAC GGGCCCC7GT 
ACTCCTAGCA GCCAGGATCC TGGACGTCCT TGTACACCTC ACCCTGGAAG GGGTAATTAC GGATGTGGTG CCCGGGGACA 



+ 2TPLP APN YTF ALWR VSA EEY VEIR QVG 
44 81 ACCCCCCTTC CTGCGCCGAA CTACACGTTC GCGCTATGGA GGGTGTCTGC AGAGGAATAC GTGGAGATAA GGCAGGTGGG 
TGGGGGGAAG GACGCGGCTT GATGTGCAAG CGCGATACCT CCCACAGACG TCTCCTTATG CACCTCTATT CCGTCCACCC 



+ 2 DFH YVTG MTT DNL KCPC QVP SPE FFT 
4 5 61 GGACTTCCAC TACGTGACGG GTATGACTAC TGACAATCTT AAATGCCCGT GCCAGGTCCC ATCGCCCGAA TTTTTCACAG 
CCTGAAGGTG ATGCACTGCC CATACTGATG ACTGTTAGAA TTTACGGGCA CGGTCCAGGG TAGCGGGCTT AAAAAGTGTC 



+2ELDG VRL HRFA PPC KPL LREE VSF RVG 
4 641 AATTGGACGG GGTGCGCCTA CATAGGTTTG CGCCCCCCTG CAAGCCCTTG CTGCGGGAGG AGGTATCATT CAGAGTAGGA 
TTAACCTGCC CCACGCGGAT GTATCCAAAC GCGGGGGGAC GTTCGGGAAC GACGCCCTCC TCCATAGTAA GTCTCATCCT 



+ 2LHEY PVG SQL PCEP EPD VAV LTSM LTD 
4721 CTCCACGAAT ACCCGGTAGG GTCGCAATTA CCTTGCGAGC CCGAACCGGA CGTGGCCGTG TTGACGTCCA TGCTCACTGA 
GAGGTGCTTA TGGGCCATCC CAGCGTTAAT GGAACGCTCG GGCTTGGCCT GCACCGGCAC AACTGCAGGT ACGAGTGACT 



+2 PSH ITAE A A G RRL ARGS PPS V A S S S A 
4801 TCCCTCCCAT ATAACAGCAG AGGCGGCCGG GCGAAGGTTG GCGAGGGGAT CACCCCCCTC TGTGGCCAGC TCCTCGGC.A 
AGGGAGGGTA TATTGTCGTC TCCGCCGGCC CGCTTCCAAC CGCTCCCCTA GTGGGGGGAG ACACCGGTCG AGGAGCCGAT 
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+2SQLS APS L K A T CTA NHD SPDA ELI E A N 
4881 GCCAGCTATC CGCTCCATCT CTCAAGGCAA CTTGCACCGC TAACCATGAC TCCCCTGATG CTGAGCTCAT AGAGGCCAAC 
CGGTCGATAG GCGAGGTAGA GAGTTCCGTT GAACGTGGCG ATTGGTACTG AGGGGACTAC GACTCGAGTA TCTCCGGTTG 



+2 llwr'qem GGN itrv ese nkv v i l d s f d 

4961 CTCCTATGGA GGCAGGAGAT GGGCGGCAAC ATCACCAGGG TTGAGTCAGA AAACAAAGTG GTGATTCTGG ACTCCTTCGA 
GAGGATACCT CCGTCCTCTA CCCGCCGTTG TAGTGGTCCC AACTCAGTCT TTTGTTTCAC CACTAAGACC TGAGGAAGCT 



+2 PLV AEED ERE ISV PAEI LRK S R R FAQ 
5041 TCCGCTTGTG GCGGAGGAGG ACGAGCGGGA GATCTCCGTA CCCGCAGAAA TCCTGCGGAA GTCTCGGAGA TTCGCCCAGG 
AGGCGAACAC CGCCTCCTCC TGCTCGCCCT CTAGAGGCAT GGGCGTCTTT AGGACGCCTT CAGAGCCTCT AAGCGGGTCC 



+ 2 A L P V -WAR P D Y N PPL VET WKKP DYE PPV 
5121 CCCTGCCCGT TTGGGCGCGG CCGGACTATA ACCCCCCGCT AGTGGAGACG TGGAAAAAGC CCGACTACGA ACCACCTGTG 
GGGACGGGCA AACCCGCGCC GGCCTGATAT TGGGGGGCGA TCACCTCTGC ACCTTTTTCG GGCTGATGCT TGGTGGACAC 



+ 2VHGC PLP PPK SPPV PPP RKK RTVV LTS 
5201 GTCCATGGCT GCCCGCTTCC ACCTCCAAAG TCCCCTCCTG TGCCTCCGCC TCGGAAGAAG CGGACGGTGG TCCTCACTGA 
CAGGTACCGA CGGGCGAAGG TGGAGGTTTC AGGGGAGGAC ACGGAGGCGG AGCCTTCTTC GCCTGCCACC AGGAGTGACT 



+ 2 STL STAL AEL ATR SFGS SST SGI TGD 
5281 ATCAACCCTA TCTACTGCCT TGGCCGAGCT CGCCACCAGA AGCTTTGGCA GCTCCTCAAC TTCCGGCATT ACGGGCGACA 
TAGTTGGGAT AGATGACGGA ACCGGCTCGA GCGGTGGTCT TCGAAACCGT CGAGGAGTTG AAGGCCGTAA TGCCCGCTGT 



+ 2NTTT SSE PAPS GCP PDS DAES Y S S MPP 
5361 ATACGACAAC ATCCTCTGAG CCCGCCCCTT CTGGCTGCCC CCCCGACTCC GACGCTGAGT CCTATTCCTC CATGCCCCCC 
TATGCTGTTG TAGGAGACTC GGGCGGGGAA GACCGACGGG GGGGCTGAGG CTGCGACTCA GGATAAGGAG GTACGGGGGG 



+2 LEGE PGD PDL SDGS WST VSS EANA E D V 

BamHI 



54 41 CTGGAGGGGG AGCCTGGGGA TCCGGATCTT AGCGACGGGT CATGGTCAAC GGTCAGTAGT GAGGCCAACG CGGAGGATGT 
GACCTCCCCC TCGGACCCCT AGGCCTAGAA TCGCTGCCCA GTACCAGTTG CCAGTCATCA CTCCGGTTGC GCCTCCTACA 



+2 VCC SMSY SWT GAL VTPC A A E EQK LP! 
5521 CGTGTGCTGC TCAATGTCTT ACTCTTGGAC AGGCGCACTC GTCACCCCGT GCGCCGCGGA AGAACAGAAA CTGCCCATCA 
GCACACGACG AGTTACAGAA TGAGAACCTG TCCGCGTGAG CAGTGGGGCA CGCGGCGCCT TCTTGTCTTT GACGGGTAGT 



+2NALS NSL LRHH NLV YST TSRS ACQ RQK 
5601 ATGCACTAAG CAACTCGTTG CTACGTCACC ACAATTTGGT GTATTCCACC ACCTCACGCA GTGCTTGCCA AAGGCAGAAG 
TACGTGATTC GTTGAGCAAC GATGCAGTGG TGTTAAACCA CATAAGGTGG TGGAGTGCGT CACGAACGGT TTCCGTCTTC 



-t-2 K V T F DRL QVL DSHY QDV LKE VKAA ASK 
5681 AAAGTCACAT TTGACAGACT GCAAGTTCTG GACAGCCATT ACCAGGACGT ACTCAAGGAG GTTAAAGCAG CGGCGTCAAA 
TTTCAGTGTA AACTGTCTGA CGTTCAAGAC CTGTCGGTAA TGGTCCTGCA TGAGTTCCTC CAATTTCGTC GCCGCAGTTT 



+2 V K A NLLS VEE ACS LTPP HSA KSK F G Y 
57 61 AGTGAAGGCT AACTTGCTAT CCGTAGAGGA AGCTTGCAGC CTGACGCCCC CACACTCAGC CAAATCCAAG TTTGGTTATG 
TCACTTCCGA TTGAACGATA GGCATCTCCT TCGAACGTCG GACTGCGGGG GTGTGAGTCG GTTTAGGTTC AAACCAATAC 



+2GAKD VRC HARK A V T H I N SVWK DLL EDN 
5841 GGGCAAAAGA CGTCCGTTGC CATGCCAGAA AGGCCGTAAC CCACATCAAC TCCGTGTGGA AAGACCTTCT GGAAGACAAT 
CCCGTTTTCT GCAGGCAACG GTACGGTCTT TCCGGCATTG GGTGTAGTTG AGGCACACCT TTCTGGAAGA CCTTCTGTTA 



+ 2VTPI DTT IMA KNEV FCV QPE KGGR KPA 
5921 GTAACACCAA TAGACACTAC CATCATGGCT AAGAACGAGG TTTTCTGCGT TCAGCCTGAG AAGGGGGGTC GTAAGCCAGC 
CATTGTGGTT ATCTGTGATG GTAGTACCGA TTCTTGCTCC AAAAGACGCA AGTCGGACTC TTCCCCCCAG CATTCGGTCG 
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+ 2 RLI V F P D L G V R V C E K M A L Y D VVT KLP 
6001 TCGTCTCATC GTGTTCCCCG ATCTGGGCGT GCGCGTGTGC GAAAAGATGG CTTTGTACGA CGTGGTTACA AAGCTCCCCT 
AGCAGAGTAG CACAAGGGGC TAGACCCGCA CGCGCACACG CTTTTCTACC GAAACATGCT GCACCAATGT TTCGAGGGGA 



+2 L A V M G S S YGFQ YSP GQR V E F L V Q A WKS 

EcoRI 



6081 TGGCCGTGAT GGGAAGCTCC TACGGATTCC AATACTCACC AGGACAGCGG GTTGAATTCC TCGTGCAAGC GTGGAAGTCC 
ACCGGCACTA CCCT7CGAGG ATGCCTAAGG TTATGAGTGG TCCTGTCGCC CAACTTAAGG AGCACGTTCG CACCTTCAGG 



+2KKTP MGF S Y D TRCF DST VTE SDIR TEE 
6161 AAGAAAACCC CAATGGGGTT CTCGTATGAT ACCCGCTGCT TTGACTCCAC AGTCACTGAG AGCGACATCC GTACGGAGGA 
TTCTTTTGGG GTTACCCCAA GAGCATACTA TGGGCGACGA AACTGAGGTG TCAGTGACTC TCGCTGTAGG CATGCCTCCT 



+ 2 AIY QCCD LDP QAR VAIK SLT ERL YVG 
6241 GGCAATCTAC CAATGTTGTG ACCTCGACCC CCAAGCCCGC GTGGCCATCA AGTCCCTCAC CGAGAGGCTT TATGTTGGGG 
CCGTTAGATG GTTACAACAC TGGAGCTGGG GGTTCGGGCG CACCGGTAGT TCAGGGAGTG GCTCTCCGAA ATACAACCCC 



-K2GPLT NSR GENC GYR RCR ASGV LTT SCG 
6321 GCCCTCTTAC CAATTCAAGG GGGGAGAACT GCGGCTATCG CAGGTGCCGC GCGAGCGGCG TACTGACAAC TAGCTGTGGT 
CGGGAGAATG GTTAAGTTCC CCCCTCTTGA CGCCGATAGC GTCCACGGCG CGCTCGCCGC ATGACTGTTG ATCGACACCA 



+ 2HTLT CYI KAR AACR A A G LQD CTML VCG 
6401 AACACCCTCA CTTGCTACAT CAAGGCCCGG GCAGCCTGTC GAGCCGCAGG GCTCCAGGAC TGCACCATGC TCGTGTGTGG 
TTGTGGGAGT GAACGATGTA GTTCCGGGCC CGTCGGACAG CTCGGCGTCC CGAGGTCCTG ACGTGGTACG AGCACACACC 



+2 DDL VVIC E S A GVQ EDAA SLR AFT EAM 
6481 CGACGACTTA GTCGTTATCT GTGAAAGCGC GGGGGTCCAG GAGGACGCGG CGAGCCTGAG AGCCTTCACG GAGGCTATGA 
GCTGCTGAAT CAGCAATAGA CACTTTCGCG CCCCCAGGTC CTCCTGCGCC GCTCGGACTC TCGGAAGTGC CTCCGATACT 



+2TRYS APP GDPP QPE YDL ELIT SCS SNV 
6561 CCAGGTACTC CGCCCCCCCT GGGGACCCCC CACAACCAGA ATACGACTTG GAGCTCATAA CATCATGCTC CTCCAACGTG 
GGTCCATGAG GCGGGGGGGA CCCCTGGGGG GTGTTGGTCT TATGCTGAAC CTCGAGTATT GTAGTACGAG GAGGTTGCAC 



+ 2SVAH DGA GKR VYYL TRD PTT PLAR A A W 
6641 TCAGTCGCCC ACGACGGCGC TGGAAAGAGG GTCTACTACC TCACCCGTGA CCCTACAACC CCCCTCGCGA GAGCTGCGTG 
AGTCAGCGGG TGCTGCCGCG ACCTTTCTCC CAGATGATGG AGTGGGCACT GGGATGTTGG GGGGAGCGCT CTCGACGCAC 



+ 2 ETA RHTP VNS WLG N I I M FAP TLW ARM 
6721 GGAGACAGCA AGACACACTC CAGTCAATTC CTGGCTAGGC AACATAATCA TGTTTGCCCC CACACTGTGG GCGAGGATGA 
CCTCTGTCGT TCTGTGTGAG GTCAGTTAAG GACCGATCCG TTGTATTAGT ACAAACGGGG GTGTGACACC CGCTCCTACT 



+ 2ILMT HFF SVLI ARD QLE QALD CEI YGA 
6801 TACTGATGAC CCATTTCTTT AGCGTCCTTA TAGCCAGGGA CCAGCTTGAA CAGGCCCTCG ATTGCGAGAT CTACGGGGCC 
ATGACTACTG GGTAAAGAAA TCGCAGGAAT ATCGGTCCCT GGTCGAACTT GTCCGGGAGC TAACGCTCTA GATGCCCCGG 



+ 2CYSI EPL DLP P I I Q RLH GLS AFSL HSY 
6881 TGCTACTCCA TAGAACCACT GGATCTACCT CCAATCATTC AAAGACTCCA TGGCCTCAGC GCATTTTCAC TCCACAGTTA 
ACGATGAGGT ATCTTGGTGA CCTAGATGGA GGTTAGTAAG TTTCTGAGGT ACCGGAGTCG CGTAAAAGTG AGGTGTCAAT 



+ 2 SPG EINR V A A CLR KLGV PPL RAW RHR 
6961 CTCTCCAGGT GAAATCAATA GGGTGGCCGC ATGCCTCAGA AAACTTGGGG TACCGCCCTT GCGAGCTTGG AGACACCGGG 
GAGAGGTCCA CTTTAGTTAT CCCACCGGCG TACGGAGTCT TTTGAACCCC ATGGCGGGAA CGCTCGAACC TCTGTGGCCC 



+2ARSV RAR LLAR GGR A A I CGKY LFN WAV 
7 041 CCCGGAGCGT CCGCGCTAGG CTTCTGGCCA GAGGAGGCAG GGCTGCCATA TGTGGCAAGT ACCTCTTCAA CTGGGCAGTA 
GGGCCTCGCA GGCGCGATCC GAAGACCGGT CTCCTCCGTC CCGACGGTAT ACACCGTTCA TGGAGAAGTT GACCCGTCAT 
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+2RTKL KLT PIA A A G Q LDL SGW FTAG YSG 
7121 AGAACAMGC TCAAACTCAC TCCAATAGCG GCCGCTGGCC AGCTGGACTT GTCCGGCTGG TTCACGGCTG GCTACAGCGG 
TCTTGTTTCG AGTTTGAGTG AGGTTATCGC CGGCGACCGG TCGACCTGAA CAGGCCGACC AAGTGCCGAC CGATGTCGCC 



+ 2 GDI YHSV SHA RPR WIWF CLL LLA A G V 
7 201 GGGAGACATT TATCACAGCG TGTCTCATGC CCGGCCCCGC TGGATCTGGT TTTGCCTACT CCTGCTTGCT GCAGGGGTAG 
CCCTCTGTAA ATAGTGTCGC ACAGAGTACG GGCCGGGGCG ACCTAGACCA AAACGGATGA GGACGAACGA CGTCCCCATC 



+ 2GIYL LPN R 
7281 GCATCTACCT CCTCCCCAAC CGATGAAGGT TGGGGTAAAC ACTCCGGCCT AAAAAAAAAA AAAAATCTAG AAAGGCGCGC 
CGTAGATGGA GGAGGGGTTG GCTACTTCCA ACCCCATTTG TGAGGCCGGA TTTTTTTTTT TTTTTAGATC TTTCCGCGCG 



. BamHI Mlul 



7361 


CAAGATATCA 
GTTCTATAGT 


AGGATCCACT 
TCCTAGGTGA 


ACGCGTTAGA 
TGCGCAATCT 


GCTCGCTGAT 
CGAGCGACTA 


CAGCCTCGAC 
GTCGGAGCTG 


TGTGCCTTCT 
ACACGGAAGA 


AGTTGCCAGC 
TCAACGGTCG 


CATCTGTTGT 
GTAGACAACA 


7441 


TTGCCCCTCC 
AACGGGGAGG 


CCCGTGCCTT 
GGGCACGGAA 


CCTTGACCCT 
GGAACTGGGA 


GGAAGGTGCC 
CCTTCCACGG 


ACTCCCACTG 
TGAGGGTGAC 


TCCTTTCCTA 
AGGAAAGGAT 


ATAAAATGAG 
TATTTTACTC 


GAAATTGCAT 
CTTTAACGTA 


7521 


CGCATTGTCT 
GCGTAACAGA 


GAGTAGGTGT 
CTCATCCACA 


CATTCTATTC 
GTAAGATAAG 


TGGGGGGTGG 
ACCCCCCACC 


GGTGGGGCAG 
CCACCCCGTC 


GACAGCAAGG 
CTGTCGTTCC 


GGGAGGATTG 
CCCTCCTAAC 


GGAAGACAAT 
CCTTCTGTTA 


7601 


AGCAGGCATG 
TCGTCCGTAC 


CTGGGGAGCT 
GACCCCTCGA 


CTTCCGCTTC 
GAAGGCGAAG 


CTCGCTCACT 
GAGCGAGTGA 


GACTCGCTGC 
CTGAGCGACG 


GCTCGGTCGT 
CGAGCCAGCA 


TCGGCTGCGG 
AGCCGACGCC 


CGAGCGGTAT 
GCTCGCCATA 


7681 


CAGCTCACTC 
GTCGAGTGAG 


AAAGGCGGTA 
TTTCCGCCAT 


ATACGGTTAT 
TATGCCAATA 


CCACAGAATC 
GGTGTCTTAG 


AGGGGATAAC 
TCCCCTATTG 


GCAGGAAAGA 
CGTCCTTTCT 


ACATGTGAGC 
TGTACACTCG 


AAAAGGCCAG 
TTTTCCGGTC 


7761 


CAAAAGGCCA 
GTTTTCCGGT 


GG AACCGTAA 
CCTTGGCATT 


AAAGGCCGCG 
TTTCCGGCGC 


TTGCTGGCGT 
AACGACCGCA 


TTTTCCATAG 
AAAAGGTATC 


GCTCCGCCCC 
CGAGGCGGGG 


CCTGACGAGC 
GGACTGCTCG 


ATCACAAAAA 
TAGTGTTTTT 


7841 


TCGACGCTCA 
AGCTGCGAGT 


AGTCAGAGGT 
TCAGTCTCCA 


GGCGAAACCC 
CCGCTTTGGG 


GACAGGACTA 
CTGTCCTGAT 


TAAAGATACC 
ATTTCTATGG 


AGGCGTTTCC 
TCCGCAAAGG 


CCCTGGAAGC 
GGGACCTTCG 


TCCCTCGTGC 
AGGGAGCACG 


7921 


GCTCTCCTGT 
CGAGAGGACA 


TCCGACCCTG 
AGGCTGGGAC 


CCGCTTACCG 
GGCGAATGGC 


GATACCTGTC 
CTATGGACAG 


CGCCTTTCTC 
GCGGAAAGAG 


CCTTCGGGAA 
GGAAGCCCTT 


GCGTGGCGCT 
CGCACCGCGA 


TTCTCAATGC 
AAGAGTTACG 


8001 


TCACGCTGTA 
AGTGCGACAT 


GGTATCTCAG 
CCATAGAGTC 


TTCGGTGTAG 
AAGCCACATC 


GTCGTTCGCT 
CAGCAAGCGA 


CCAAGCTGGG 
GGTTCGACCC 


CTGTGTGCAC 
GACACACGTG 


GAACCCCCCG 
CTTGGGGGGC 


TTCAGCCCGA 
AAGTCGGGCT 


8081 


CCGCTGCGCC 
GGCGACGCGG 


TTATCCGGTA 
AATAGGCCAT 


ACTATCGTCT 
TGATAGCAGA 


TGAGTCCAAC 
ACTCAGGTTG 


CCGGTAAGAC 
GGCCATTCTG 


ACGACTTATC 
TGCTGAATAG 


GCCACTGGCA 
CGGTGACCGT 


GCAGCCACTG 
CGTCGGTGAC 


8161 


GTAACAGGAT TAGCAGAGCG 
CATTGTCCTA ATCGTCTCGC 


AGGTATGTAG 
TCCATACATC 


GCGGTGCTAC 
CGCCACGATG 


AGAGTTCTTG 
TCTCAAGAAC 


AAGTGGTGGC 
TTCACCACCG 


CTAACTACGG 
GATTGATGCC 


CT AC ACT AG A 
GATGTGATCT 


8241 


AGGACAGTAT 
TCCTGTCATA 


TTGGTATCTG 
AACCATAGAC 


CGCTCTGCTG 
GCGAGACGAC 


AAGCCAGTTA 
TTCGGTCAAT 


CCTTCGGAAA 
GGAAGCCTTT 


AAGAGTTGGT 
TTCTCAACCA 


AGCTCTTGAT 
TCGAGAACTA 


CCGGCAAACA 
GGCCGTTTGT 



8321 AACCACCGCT GGTAGCGGTG GTTTTTTTGT TTGCAAGCAG CAGATTACGC GCAGAAAAAA AGGATCTCAA GAAGATCCTT 
TTGGTGGCGA CCATCGCCAC CAAAAAAACA AACGTTCGTC GTCTAATGCG CGTCTTTTTT TCCTAGAGTT CTTCTAGGAA 



84 01 TGATCTTTTC TACGGGGTCT GACGCTCAGT GGAACGAAAA CTCACGTTAA GGGATTTTGG TCATGAGATT ATCAAAAAGG 
ACTAGAAAAG ATGCCCCAGA CTGCGAGTCA CCTTGCTTTT GAGTGCAATT CCCTAAAACC AGTACTCTAA TAGTTTTTCC 
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fidfll ATCTTCACCT AGATCCTTTT AAATTAAAAA TGAAGTTTTA AATCAATCTA AAGTATATAT GAGTAAACTT GGTCTGACAG 

.tagaagtgS tctaggaaaa tttaattttt acttcaaaat ttag ttagat TTCATATATA ctcatttgaa ccagactgtc 

8561 TTACCAATGC TTAATCAGTG AGGCACCTAT CTCAGCGATC TGTCTATTTC GTTCATCCAT AGTTGCCTGA CTCCCCGTCG 
8561 AATGGTTACG AATTAGTCAC TCCGTGGATA GAGTCGCTAG AC AGATAAAG CAAGTAGGTA TCAACGGACT GAGGGGCAGC 

8641 TGTAGATAAC TACGATACGG GAGGGCTTAC CATCTGGCCC CAGTGCTGCA ATGATACCGC GAGACCCACG CTCACCGGCT 
ACATCTATTG ATGCTATGCC CTCCCGAATG GTAGACCGGG GTCACGACGT T ACTATGGCG CTCTGGGTGC GAGTGGCCGA 

8721 CCAGATTTAT CAGCAATAAA CCAGCCAGCC GGAAGGGCCG AGCGCAGAAG TGGTCCTGCA ACTTTATCCG CCTCCATCCA 
GGTCTAAATA GTCGTTATTT GGTCGGTCGG CCTTCCCGGC TCGCGTCTTC ACCAGGACGT TGAAATAGGC GGAGGTAGGT 

8801 GTCTATTAAT TGTTGCCGGG AAGCTAGAGT AAGTAGTTCG CCAGTTAATA GTTTGCGCAA CGTTGTTGCC ATTGCTACAG 

cagSS™ acaacggccc ttcgatctca ttcatcaa gc ggtcaattat caaacgcgtt gcaacaacgg TAACGATGTC 

8 881 gcatcgtggt gtcacgctcg tcgtttggta tggcttcatt cagctccggt tcccaacgat caaggcgagt tacatgatcc 
cgtagcacca Sgtgcgagc Lcaaac cat accgaagtaa gtcgaggcca agggttgcta gttccgctca atgtactagg 

8Q61 cccatgttgt gcaaaaaagc ggttagctcc ttcggtcctc cgatcgttgt cagaagtaag ttggccgcag tgttatcact 
gggSS cgttttttcg ccaatcgagg aagccaggag gctagcaaca gtcttcattc aaccggcgtc acaatagtga 

anal raTCGTTATG GCAGCACTGC ATAATTCTCT TACTGTCATG CCATCCGTAA GATGCTTTTC TGTGACTGGT GAGTACTCAA 

gSSS? S^ aagLa atgacagtac ggtaggcatt ctacgaaaag acactgacca ctcatgagtt 

91 ?1 CCAAGTCATT ctgagaatag tgtatgcggc gaccgagttg ctcttgcccg gcgtcaatac gggataatac cgcgccacat 

GgSS SSSiSc ACATACGCCG CTGGCTCAAC GAGAACGGGC CGCAGTTATG CCCTATTATG GCGCGGTGTA 

c ., f . 1 arrRrAACTT TAAAAGTGCT CATCATTGGA AAACGTTCTT CGGGGCGAAA ACTCTCAAGG ATCTTACCGC TGTTGAGATC 
!SS S5S5S2S SaSaacct TTTGCAAGAA GCCCCGCTTT TGAGAGTTCC TAGAATGGCG ACAACTCTAG 



92 81 CAGTTCGATG TAACCCACTC GTGCACCCAA CTGATCTTCA GCATCTTTTA CTTTCACCAG CGTTTCTGGG TGAGCAAAAA 

S?aIgc?ac attgggtgag CACGTGGGTT gactag aagt cgtagaaaat GAAAGTGGTC gcaaagaccc actcgttttt 

»" = a sass as asss =s sssss =i" 

»" ESSS = SSS = SSSE SEES = =£ 



9601 GTATCACGAG GCCCTTTCGT C 
CATAGTGCTC CGGGAAAGCA G 



FIGURE 4 



Sfwl(211) 




delNS35 ORF 
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1 TCGCGCGTTT CGGTGATGAC GGTGAAAACC TCTGACACAT GCAGCTCCCG GAGACGGTCA CAGCTTGTCT GTAAGCGGAT 
AGCGCGCAAA GCCACTACTG CCACTTTTGG AGACTGTGTA CGTCGAGGGC CTCTGCCAGT GTCGAACAGA CATTCGCCTA 



81 


GCCGGGAG^A GACAAGCCCG TCAGGGCGCG 
CGGCCCTCGT CTGTTCGGGC AGTCCCGCGC 


TCAGCGGGTG 
AGTCGCCCAC 


TTGGCGGGTG 
AACCGCCCAC 


TCGGGGCTGG 
AGCCCCGACC 


CTTAACTATG 
GAATTGATAC 


CGGCATCAGA 
GCCGTAGTCT 


161 


GCAGATTGTA 


CTGAGAGTGC 


ACCATATGAA 


GCTTTTTGCA 
CGAAAAACGT 


StuI 

AAAGCCTAGG CCTCCAAAAA 
TTTCGGATCC GGAGGTTTTT 


AGCCTCCTCA 
TCGGAGGAGT 


CTACTTCTGG 
GATGAAGACC 


241 


AATAGCTCAG 


AGGCCGAGGC 


GGCCTCGGCC 


TCTGCATAAA 
AGACGTATTT 


TAAAAAAAAT 
ATTTTTTTTA 


TAGTCAGCCA 
ATCAGTCGGT 


TGGGGCGGAG 
ACCCCGCCTC 


AATGGGCGGA 
TTACCCGCCT 


321 


ACTGGGCGGG 

r rr , Tv rrrrr'^/* 1 
1 GACCCbCCC 


GAGGGAATTA 

LILLl-I 1AA1 


TTGGCTATTG 


GCCATTGCAT 
CGGTAACGTA 


ACGTTGTATC 
TGCAACATAG 


TATATCATAA 
ATATAGTATT 


TATGTACATT 
ATACATGTAA 


TATATTGGCT 
ATATAACCGA 


401 


CATGTCCAAT 
GTACAGGTTA 


ATGACCGCCA 

TACTGGCGGT 


TGTTGACATT 

J\ P7\ 7\ PITTS A 

ACAAC 1 vj 1 An 


GATTATTGAC 
CTAATAACTG 


TAGTTATTAA 
ATCAATAATT 


TAGTAATCAA 
ATCATTAGTT 


TTACGGGGTC 
AATGCCCCAG 


ATTAGTTCAT 
TAATCAAGTA 


481 


AGCCCATATA 
TCGGGTATAT 


TGGAGTTCCG 
ACCTCAAGGC 


CGTTACATAA 
GLAAlbl Al 1 


CTTACGGTAA 
GAATGCCATT 


ATGGCCCGCC 
TACCGGGCGG 


TGGCTGACCG 
ACCGACTGGC 


CCCAACGACC 
GGGTTGCTGG 


CCCGCCCATT 
GGGCGGGTAA 


561 


GACGTCAATA 
CTGCAGTTAT 


ATGACGTATG 
TACTGCATAC 


TTCCCATAGT 

AAGGtsl A TuA 


AACGCCAATA 
TTGCGGTTAT 


GGGACTTTCC 
CCCTGAAAGG 


ATTGACGTCA 
TAACTGCAGT 


ATGGGTGGAG 
TACCCACCTC 


TATTTACGGT 
ATAAATGCCA 


641 


AAACTGCCCA 
TTTGACGGGT 


CTTGGCAGTA 
GAACCGTCAT 


CATCAAGTGT 

Vj TAu 1 1 ^Av- A 


ATCATATGCC 
TAGTATACGG 


AAGTCCGCCC 
TTCAGGCGGG 


CCTATTGACG 
GGATAACTGC 


TCAATGACGG 
AGTTACTGCC 


TAAATGGCCC 
ATTTACCGGG 


721 


GCCTGGCATT 
CGGACCGTAA 


ATGCCCAGTA 
TACGGGTCAT 


CATGACCTTA 
GTACTGGAAT 


CGGGACTTTC 
GCCCTGAAAG 


CTACTTGGCA 
GATGAACCGT 


GTACATCTAC 
CATGTAGATG 


GTATTAGTCA 
CATAATCAGT 


TCGCTATTAC 
AGCGATAATG 


801 


CATGGTGATG 
GTACCACTAC 


CGGTTTTGGC 
GCCAAAACCG 


AGTACACCAA 
TCATGTGGTT 


TGGGCGTGGA 
ACCCGCACCT 


TAGCGGTTTG 
ATCGCCAAAC 


ACTCACGGGG 
TGAGTGCCCC 


ATTTCCAAGT 
TAAAGGTTCA 


CTCCACCCCA 
GAGGTGGGGT 


881 


TTGACGTCAA 
AACTGCAGTT 


TGGGAGTTTG 
ACCCTCAAAC 


TTTTGGCACC 
AAAACCGTGG 


AAAATCAACG 
TTTTAfiTTGP 

1111 i. 1 


GGACTTTCCA 
CCTGAAAGGT 


AAATGTCGTA 
TTTACAGCAT 


ATAACCCCGC 
TATTGGGGCG 


CCCGTTGACG 
GGGCAACTGC 


961 


CAAATGGGCG 
GTTTACCCGC 


GTAGGCGTGT 
CATCCGCACA 


ACGGTGGGAG 
TGCCACCCTC 


GTCTATATAA 
CAGATATATT 


GCAGAGCTCG 
CGTCTCGAGC 


TTTAGTGAAC 
AAATCACTTG 


CGTCAGATCG 
GCAGTCTAGC 


CCTGGAGACG 
GGACCTCTGC 


1041 


CCATCCACGC 
GGTAGGTGCG 


TGTTTTGACC 
ACAAAACTGG 


TCCATAGAAG 
AGGTATCTTC 


ACACCGGGAC 
TGTGGCCCTG 


CGATCCAGCC 
GCTAGGTCGG 


TCCGCGGCCG 
AGGCGCCGGC 


GGAACGGTGC 
CCTTGCCACG 


ATTGGAACGC 
TAACCTTGCG 


1121 


GGATTCCCCG 
CCTAAGGGGC 


TGCCAAGAGT 
ACGGTTCTCA 


GACGTAAGTA 
CTGCATTCAT 


CCGCCTATAG 
GGCGGATATC 


ACTCTATAGG 
TGAGATATCC 


CACACCCCTT 
GTGTGGGGAA 


TGGCTCTTAT 
ACCGAGAATA 


GCATGCTATA 
CGTACGATAT 


1201 


CTGTTTTTGG 
GACAAAAACC 


CTTGGGGCCT 
GAACCCCGGA 


ATACACCCCC 
TATGTGGGGG 


GCTCCTTATG 
CGAGGAATAC 


CTATAGGTGA 
GATATCCACT 


TGGTATAGCT 
ACCAXATCGA 


TAGCCTATAG 
ATCGGATATC 


GTGTGGGTTA 
CACACCCAAT 



1281 TTGACCATTA TTGACCACTC CCCTATTGGT GACGATACTT TCCATTACTA ATCCATAACA TGGCTCTTTG CCACAACTAT 
AACTGGTAAT AACTGGTGAG GGGATAACCA CTGCTATGAA AGGTAATGAT TAGGTATTGT ACCGAGAAAC GGTGTTGATA 



1361 CTCTATTGGC TATATGCCAA TACXCTGTCC TTCAGAGACT GACACGGACT CTGTATTTTT ACAGGATGGG GTCCATTTAT 
GAGATAACCG ATATACGGTT ATGAGACAGG AAGTCTCTGA CTGTGCCTGA GACATAAAAA TGTCCTACCC CAGGTAAATA 
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1441 TATTTACAAA TTCACATATA CAACAACGCC GTCCCCCGTG CCCGCAGTTT TTATTAAACA TAGCGTGGGA TCTCCGACAT 
ATAAATGTTT AAGTGTATAT GTTGTTGCGG CAGGGGGCAC GGGCGTCAAA AATAATTTGT ATCGCACCCT AGAGGCTGTA 

1521 CTCGGGTACG TGTTCCGGAC ATGGGCTCTT CTCCGGTAGC GGCGGAGCTT CCACATCCGA GCCCTGGTCC CATCCGTCCA 
GAGCCCATGC ACAAGGCCTG TACCCGAGAA GAGGCCATCG CCGCCTCGAA GGTGTAGGCT CGGGACCAGG GTAGGCAGGT 



1601 GCGGCTCATG GTCGCTCGGC AGCTCCTTGC TCCTAACAGT GGAGGCCAGA CTTAGGCACA GCACAATGCC CACCACCACC 
CGCCGAGTAC CAGCGAGCCG TCGAGGAACG AGGATTGTCA CCTCCGGTCT GAATCCGTGT CGTGTTACGG GTGGTGGTGG 



1681 AGTGTGCCGC ACAAGGCCGT GGCGGTAGGG TATGTGTCTG AAAATGAGCT CGGAGATTGG 
TCACACGGCG TGTTCCGGCA CCGCCATCCC ATACACAGAC TTTTACTCGA GCCTCTAACC 



GCTCGCACCT GGACGCAGAT 
CGAGCGTGGA CCTGCGTCTA 



1761 GGAAGACTTA AGGCAGCGGC AGAAGAAGAT GCAGGCAGCT GAGTTGTTGT 
CCTTCTGAAT TCCGTCGCCG TCTTCTTCTA CGTCCGTCGA CTCAACAACA 



ATTCTGATAA GAGTCAGAGG TAACTCCCGT 
TAAGACTATT CTCAGTCTCC ATTGAGGGCA 



1841 TGCGGTGCTG TTAACGGTGG AGGGCAGTGT AGTCTGAGCA GTACTCGTTG 
ACGCCACGAC AATTGCCACC TCCCGTCACA TCAGACTCGT CATGAGCAAC 



CTGCCGCGCG CGCCACCAGA CATAATAGCT 
GACGGCGCGC GCGGTGGTCT GTATTATCGA 



+2 



M A A 



EcoRI 



1Q91 CACACACTRA CAGACTGTTC CTTTCCATGG GTCTTTTCTG CAGTCACCGT CGTCGACCTA AGAATTCACC ATGGCTGCAT 

SSSSn gtSgacSg gaaa ggtacc cagaaaagac gtcagtggca gcagctggat tcttaagtgg taccgacgta 

^ v A A n G Y K VLVL NPS V A A TLGF GAY MSK 
?nni ATrrAGCTCA GGGCTATAAG GTGCTAGTAC TCAACCCCTC TGTTGCTGCA ACACTGGGCT TTGGTGCTTA CATGTCCAAG 

tSSS cccgSISc cac gatcatg agttggggag acaacgacgt tgtgacccga aaccacgaat gtacaggttc 

t , , „ r T npN IRT GVRT ITT GSP I T Y S TYG 
* 2 ppTrATGGGA TCGATCCTAA CATCAGGACC GGGGTGAGAA CAATTACCAC TGGCAGCCCC ATCACGTACT CCACCTACGG 

2081 cSgtISt SSSSS? gtaScctgg CCCCACTCTT GTTAATGGTG accgtcgggg TAGTGCATGA ggtggatgcc 

91 fit 2 PAACTTCCTT GCCGACGGCG^GGTGCTCGGG GGGCGCTTAT GACATAATAA^ TTTGTGACGA GTGCCACTCC ACGGATGCCA 
21 G^SIgSI CGGCTGCCGC g SSSS CCCGCGAATA CTGTATTATT AAACACTGCT CACGGTGAGG TGCCTACGGT 

__ CT7 r t C TVLD QAE TAG A R L V VLA TAT 

bbb iga ?= ssasg s= s 
™ ssss = ssss sssVssss S= ssss S2SS 



»" sis si sbs sdss'ssss sHsl ggssb 
™ ~ ssss == ssas == ssss 
- sssVsssi ssss ssss 1 ssss = sssss sssi 
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+ 2 VTQ TVDF SLD P T F T I E T I T L PQD AVS 
2641 TGTCACCCAG ACAGTCGATT TCAGCCTTGA CCCTACCTTC ACC ATT GAGA CAATCACGCT CCCCCAAGAT GCTGTCTCCC 
ACAGTGGGTC TGTCAGCTAA AGTCGGAACT GGGATGGAAG TGGTAACTCT GTTAGTGCGA GGGGGTTCTA CGACAGAGGG 



+2RTQR RGR TGRG K P G I Y R F V A P GER PSG 
27 21 GCACTCAACG TCGGGGCAGG ACTGGCAGGG GGAAGCCAGG CATCTACAGA TTTGTGGCAC CGGGGGAGCG CCCCTCCGGC 
CGTGAGTTGC AGCCCCGTCC TGACCGTCCC CCTTCGGTCC GTAGATGTCT AAACACCGTG GCCCCCTCGC GGGGAGGCCG 



+ 2MFDS S V L CEC YDAG CAW YEL TPAE T T V 
2801 ATGTTCGACT CGTCCGTCCT CTGTGAGTGC TATGACGCAG GCTGTGCTTG GTATGAGCTC ACGCCCGCCG AGACTACAGT 
TACAAGCTGA GCAGGCAGGA GACACTCACG ATACTGCGTC CGACACGAAC CATACTCGAG TGCGGGCGGC TCTGATGTCA 



+ 2 RLR AYMN TPG LPV CQDH -L E F WEG V F T 

StuI 

2881 TAGGCTACGA GCGTACATGA ACACCCCGGG GCTTCCCGTG TGCCAGGACC ATCTTGAATT TTGGGAGGGC GTCTTTACAG 
ATCCGATGCT CGCATGTACT TGTGGGGCCC CGAAGGGCAC ACGGTCCTGG TAGAACTTAA AACCCTCCCG CAGAAATGTC 



+ 2GLTH IDA HFLS QTK QSG ENLP YLV AYQ 
StuI 

2961 GCCTCACTCA TATAGATGCC CACTTTCTAT CCCAGACAAA GCAGAGTGGG GAGAACCTTC CTTACCTGGT AGCGTACCAA 
CGGAGTGAGT ATATCTACGG GTGAAAGATA GGGTCTGTTT CGTCTCACCC CTCTTGGAAG GAATGGACCA TCGCATGGTT 



4-2ATVC A R A QAP PPSW DQM WKC LIRL KPT 
3041 GCCACCGTGT GCGCTAGGGC TCAAGCCCCT CCCCCATCGT GGGACCAGAT GTGGAAGTGT TTGATTCGCC TCAAGCCCAC 
CGGTGGCACA CGCGATCCCG AGTTCGGGGA GGGGGTAGCA CCCTGGTCTA CACCTTCACA AACTAAGCGG AGTTCGGGTG 



+2 LHG PTPL LYR LGA VQNE ITL THP VTK 
3121 CCTCCATGGG CCAACACCCC TGCTATACAG ACTGGGCGCT GTTCAGAATG AAATCACCCT GACGCACCCA GTCACCAAAT 
GGAGGTACCC GGTTGTGGGG ACGATATGTC TGACCCGCGA CAAGTCTTAC TTTAGTGGGA CTGCGTGGGT CAGTGGTTTA 



+ 2YIMT CMS ADLE VVT STW V L V G G V L A A L 
3201 ACATCATGAC ATGCATGTCG GCCGACCUGG AGGTCGTCAC GAGCACCTGG GTGCTCGTTG GCGGCGTCCT GGCTGCTTTG 
TGTAGTACTG TACGTACAGC CGGCTGGACC TCCAGCAGTG CTCGTGGACC CACGAGCAAC CGCCGCAGGA CCGACGAAAC 



+ 2AAYC LST GCV VIVG RVV LSG KPAI IPD 
3281 GCCGCGTATT GCCTGTCAAC AGGCTGCGTG GTCATAGTGG GCAGGGTCGT CTTGTCCGGG AAGCCGGCAA TCATACCTGA 
CGGCGCATAA CGGACAGTTG TCCGACGCAC CAGTATCACC CGTCCCAGCA GAACAGGCCC TTCGGCCGTT AGTATGGACT 



+ 2 REV LYRE FDE MEE CSQH LPY IEQ G M M 
3361 CAGGGAAGTC CTCTACCGAG AGTTCGATGA GATGGAAGAG TGCTCTCAGC ACTTACCGTA CATCGAGCAA GGGATGATGC 
GTCCCTTCAG GAGATGGCTC TCAAGCTACT CTACCTTCTC ACGAGAGTCG TGAATGGCAT GTAGCTCGTT CCCTACTACG 



+ 2LAEQ FKQ K A L G LLQ TAS RQAE VIA P A V 
3441 TCGCCGAGCA GTTCAAGCAG AAGGCCCTCG GCCTCCTGCA GACCGCGTCC CGTCAGGCAG AGGTTATCGC CCCTGCTGTC 
AGCGGCTCGT CAAGTTCGTC TTCCGGGAGC CGGAGGACGT CTGGCGCAGG GCAGTCCGTC TCCAATAGCG GGGACGACAG 



+ 2QTNW QKL ETF WAKH MWN FIS GIQY LAG 
3521 CAGACCAACT GGCAAAAACT CGAGACCTTC TGGGCGAAGC ATATGTGGAA CTTCATCAGT GGGATACAAT ACTTGGCGGG 
GTCTGGTTGA CCGTTTTTGA GCTCTGGAAG ACCCGCTTCG TATACACCTT GAAGTAGTCA CCCTATGTTA TGAACCGCCC 



+ 2 LST LPGN PAI ASL MAFT AAV TSP LTT 
3601 CTTGTCAACG CTGCCTGGTA ACCCCGCCAT TGCTTCATTG ATGGCTTTTA CAGCTGCTGT CACCAGCCCA CTAACCACTA 
GAACAGTTGC GACGGACCAT TGGGGCGGTA ACGAAGTAAC TACCGAAAAT GTCGACGACA GTGGTCGGGT .GATTGGTGAT 



+2SQTL LFN ILGG WVA AQL AAPG AAT AFV 
3681 GCCAAACCCT CCTCTTCAAC ATATTGGGGG GGTGGGTGGC TGCCCAGCTC GCCGCCCCCG GTGCCGCTAC TGCCTTTGTG 
CGGTTTGGGA GGAGAAGTTG TATAACCCCC CCACCCACCG ACGGGTCGAG CGGCGGGGGC CACGGCGATG ACGGAAACAC 
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+2GAGL A G A A I G. S V G L GKV LID ILAG YGA 
3761 GGCGCTGGCT TAGCTGGCGC CGCCATCGGC AGTGTTGGAC TGGGGAAGGT CCTCATAGAC ATCCTTGCAG GGTATGGCGC 
CCGCGACCGA ATCGACCGCG GCGGTAGCCG TCACAACCTG ACCCCTTCCA GGAGTATCTG TAGGAACGTC CCATACCGCG 

+ 2 GVA GALV AFK IMS GEVP STE DLV NLL 
38 41 GGGCGTGGCG GGAGCTCTTG TGGCATTCAA GATCATGAGC GGTGAGGTCC CCTCCACGGA GGACCTGGTC AATCTACTGC 
TCCGCACCGC CCTCGAGAAC ACCGTAAGTT CTAGTACTCG CCACTCC AGG GGAGGTGCCT CCTGGACCAG TTAGATGACG 

+2 PAIL SPG ALVV GVV CAA ILRR HVG PGE 
3921 CCGCCATCCT CTCGCCCGGA GCCCTCGTAG TCGGCGTGGT CTGTGCAGCA ATACTGCGCC GGCACGTTGG CCCGGGCGAG 
GGCGGTAGGA GAGCGGGCCT CGGGAGCATC AGCCGCACCA GACACGTCGT T ATGACGCGG CCGTGCAACC GGGCCCGCTC 

+ 2GAVQ WMN R L I A F A S RGN HVS PTHY VPE 
4001 GGGGCAGTGC AGTGGATGAA CCGGCTGATA GCCTTCGCCT CCCGGGGGAA CCATGTTTCC CCCACGCACT ACGTGCCG.A 
CCCCGTCACG TCACCTACTT GGCCGACTAT CGGAAGCGGA GGGCCCCCTT GGTACAAAGG GGGTGCGTGA TGCACGGC^T 



+2 SDA AARV TAI LSS LTVT QLL RRL HQW 
4 081 GAGCGATGCA GCTGCCCGCG TCACTGCCAT ACTCAGCAGC CTCACTGTAA CCCAGCTCCT GAGGCGACTG CACCAGTGGA 
ScgcSgt CGACGGGCGC AGTGACGGTA T GAGTCGTCG GAGTGACATT GGGTCGAGGA ctccgctgac GTGGTCACCT 

+2ISSE CTT PCSG SWL RDI WDWI CEV LSD 
difil TAAGCTCGGA GTGTACCACT CCATGCTCCG GTTCCTGGCT AAGGGACATC TGGGACTGGA TATGCGAGGT GTTGAGCGAC 
a^cSSS ScSStS GGTA CGAGGC CAAGGACCGA TTCCCTGTAG ACCCTGACCT ATACGCTCCA CAACTCGCTG 

+2FK TW LKA KLM PQLP GIP TVS CQRG YKG 

BantHI 

4241 TTTAAGACCT GGCTAAAAGC TAAGCTCATG CCACAGCTGC CTG^ATCCC CTTTGTGTCC JGCCAGCGCG GGTATAAGGG 
AAATTCTGGA CCGATTTTCG ATTCGAGTAC GGTGTCGACG GACCCTAGGG GAAACACAGG ACGGTCGCGC CCATAT.CCC 



* 2 rr L T ™ A ggggacggca^ca^gcacaJ tcgctgccac tgtggag?tg E agatcactgg acatgtcaaa AACGGGACGA 
SSSS SSgc?g? ag?acg?Sg agcgacggtg acacctcgac tctagtgacc tgtacagttt ttgccctgct 



4401 



+2 tgaggatcgt cgg^cctagg aJctgcaLa^ca^tggag tgggaccttc ^^^^^^SE^SI^^E fkkdl 



IctSgS gSggatcc tggacgtcc; ^acacc^c a^ctggaag gggtaattac ggatgtggtg cccggggaca 



+ ? t P L P APN YTF ALWR VSA EEY VEIR Q V G 
„„ a t J-rrrrrrrr rrrCGCCGAA CTACACGTTC GCGCTATGGA GGGTGTCTGC AGAGGAATAC GTGGAGATAA GGCAGGTGGG 

4481 ?gggggga!g Scgcggc?? gaSSg CGCGATACCT cccacagacg tctccttatg cacctctatt ccgtccaccc 



4561 



4641 



+2 
4721 



SgaIgg^g aSt£c cata^g actg^agaa tttacgggca cggtccaggg tagcgggctt aaaaagtgtc 
+2 aattggacgg gg?gcLc£a c^aggttVcck:ccccctg caagcccttg gg^^™" cagagtagga 

TTAACCTGCC CCACGCGGAT GTATCCAAAC GCGGGGGGAC GTTCGGGAAC GACGCCCTCC TCCATAGTAA billion 
Y PVG SQL PCEP EPD VAV LTSM LT D 



sass das = ssss ssss ass = bbsb 
esse ssss'dss sssb sra ssss ssss ssbb 



+ 2 P S 
4801 
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+ 2SQLS APS LKAT CTA NHD SPDA ELI EAN 
4881 GCCAGCTATC CGCTCCATCT CTCMGGCAA CTTGCACCGC TAACCATGAC TCCCCTGATG CTGAGCTCAT AGAGGCCAAC 
CGGTCGATAG GCGAGGTAGA GAGTTCCGTT GAACGTGGCG ATTGGTACTG AGGGGACTAC GACTCGAGTA TCTCCGGTTG 



+2LLWR*QEM GGN ITRV ESE NKV VILD S F D 
4 961 CTCCTATGGA GGCAGGAGAT GGGCGGCAAC ATCACCAGGG TTGAGTCAGA AAACAAAGTG GTGATTCTGG ACTCCTTCGA 
GAGGATACCT CCGTCCTCTA CCCGCCGTTG TAGTGGTCCC AACTCAGTCT TTTGTTTCAC CACTAAGACC TGAGGAAGCT 



+ 2 PLV AEED ERE I S V PAEI LRK SRR FAQ 
5041 TCCGCTTGTG GCGGAGGAGG ACGAGCGGGA GATCTCCGTA CCCGCAGAAA TCCTGCGGAA GTCTCGGAGA TTCGCCCAGG 
AGGCGAACAC CGCCTCCTCC TGCTCGCCCT CTAGAGGCAT GGGCGTCTTT AGGACGCCTT CAGAGCCTCT AAGCGGGTCC 



+ 2 A L P V -WAR P D Y N PPL VET WKKP DYE P P V 
5121 CCCTGCCCGT TTGGGCGCGG CCGGACTATA ACCCCCCGCT AGTGGAGACG TGGAAAAAGC CCGACTACGA ACCACCTGTG 
GGGACGGGCA AACCCGCGCC GGCCTGATAT TGGGGGGCGA TCACCTCTGC ACCTTTTTCG GGCTGATGCT TGGTGGACAC 



+2VHGC PLP PPK SPPV PPP RKK RTVV LIE 
5201 GTCCATGGCT GCCCGCTTCC ACCTCCAAAG TCCCCTCCTG TGCCTCCGCC TCGGAAGAAG CGGACGGTGG TCCTCACTGA 
CAGGTACCGA CGGGCGAAGG TGGAGGTTTC AGGGGAGGAC ACGGAGGCGG AGCCTTCTTC GCCTGCCACC AGGAGTGACT 



+2 STL STAL AEL ATR S F G S SST SGI TGD 
5281 ATCAACCCTA TCTACTGCCT TGGCCGAGCT CGCCACCAGA AGCTTTGGCA GCTCCTCAAC TTCCGGCATT ACGGGCGACA 
TAGTTGGGAT AGATGACGGA ACCGGCTCGA GCGGTGGTCT TCGAAACCGT CGAGGAGTTG AAGGCCGTAA TGCCCGCTGT 



+2NTTT SSE PAPS GCP PDS DAES YSS MPP 
5361 ATACGACAAC ATCCTCTGAG CCCGCCCCTT CTGGCTGCCC CCCCGACTCC GACGCTGAGT CCTATTCCTC CATGCCCCCC 
TATGCTGTTG TAGGAGACTC GGGCGGGGAA GACCGACGGG GGGGCTGAGG CTGCGACTCA GGATAAGGAG GTACGGGGGG 



+2 LEGE PGD PDL SDGS WST VSS EANA E D V 

BamHI 



5441 CTGGAGGGGG AGCCTGGGGA TCCGGATCTT AGCGACGGGT CATGGTCAAC GGTCAGTAGT GAGGCCAACG CGGAGGATGT 
GACCTCCCCC TCGGACCCCT AGGCCTAGAA TCGCTGCCCA GTACCAGTTG CCAGTCATCA CTCCGGTTGC GCCTCCTACA 



+ 2 VCC SMSY SWT GAL VTPC A A E EQK LPI 
5521 CGTGTGCTGC TCAATGTCTT ACTCTTGGAC AGGCGCACTC GTCACCCCGT GCGCCGCGGA AGAACAGAAA CTGCCCATCA 
GCACACGACG AGTTACAGAA TGAGAACCTG TCCGCGTGAG CAGTGGGGCA CGCGGCGCCT TCTTGTCTTT GACGGGTAGT 



+2NALS NSL LRHH NLV YST TSRS ACQ RQK 
5601 ATGCACTAAG CAACTCGTTG CTACGTCACC ACAATTTGGT GTATTCCACC ACCTCACGCA GTGCTTGCCA AAGGCAGAAG 
TACGTGATTC GTTGAGCAAC GATGCAGTGG TGTTAAACCA CATAAGGTGG TGGAGTGCGT CACGAACGGT TTCCGTCTTC 



+ 2KVTF DRL Q V L DSHY QDV LKE VKAA ASK 
5681 AAAGTCACAT TTGACAGACT GCAAGTTCTG GACAGCCATT ACCAGGACGT ACTCAAGGAG GTTAAAGCAG CGGCGTCAAA 
TTTCAGTGTA AACTGTCTGA CGTTCAAGAC CTGTCGGTAA TGGTCCTGCA TGAGTTCCTC CAATTTCGTC GCCGCAG77T 



+2 V K A NLLS VEE ACS LTPP HSA KSK F G Y 
5761 AGTGAAGGCT AACTTGCTAT CCGTAGAGGA AGCTTGCAGC CTGACGCCCC CACACTCAGC CAAATCCAAG TTTGGTTATG 
TCACTTCCGA TTGAACGATA GGCATCTCCT TCGAACGTCG GACTGCGGGG GTGTGAGTCG GTTTAGGTTC AAACCAATAC 



+ 2GAKD VRC HARK A V T HIN SVWK DLL EDN 
5841 GGGCAAAAGA CGTCCGTTGC CATGCCAGAA AGGCCGTAAC CCACATCAAC TCCGTGTGGA AAGACCTTCT GGAAGACAAT 
CCCGTTTTCT GCAGGCAACG GTACGGTCTT TCCGGCATTG GGTGTAGTTG AGGCACACCT TTCTGGAAGA CCTTCTGTTA 



+2VTPI DTT IMA KNEV FCV QPE KGGR K P A 
5921 GTAACACCAA TAGACACTAC CATCATGGCT AAGAACGAGG TTTTCTGCGT TCAGCCTGAG AAGGGGGGTC GTAAGCCAGC 
CATTGTGGTT ATCTGTGATG GTAGTACCGA TTCTTGCTCC AAAAGACGCA AGTCGGACTC TTCCCCCCAG CATTCGGTCG 
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+2 R L I V F P D L G V R V C E K M A L Y D VVT KLP 
6001 TCGTCTCATC GTGTTCCCCG ATCTGGGCGT GCGCGTGTGC GAAAAGATGG CTTTGTACGA CGTGGTTACA AAGCTCCCCT 
AGCAG^GTAG CACAAGGGGC TAGACCCGCA CGCGCACACG CTTTTCTACC GAAACATGCT GCACCAATGT TTCGAGGGGA 



+2LAVM-GSS Y G F Q Y S P GQR V E F L V Q A WKS 

EcoRI 



6081 TGGCCGTGAT GGGAAGCTCC TACGGATTCC AATACTCACC AGGACAGCGG GTTGAATTCC TCGTGCAAGC GTGGAAGTCC 
ACCGGCACTA CCCTTCGAGG ATGCCTAAGG TTATGAGTGG TCCTGTCGCC CAACTTAAGG AGCACGTTCG CACCTTCAGG 



+2KKTP M G F SYD T R C F DST VTE SDIR TEE 
6161 AAGAAAACCC CAATGGGGTT CTCGTATGAT ACCCGCTGCT TTGACTCCAC AGTCACTGAG AGCGACATCC GTACGGAGGA 
TTCTTTTGGG GTTACCCCAA GAGCATACTA TGGGCGACGA AACTGAGGTG TCAGTGACTC TCGCTGTAGG CATGCCTCCT 



+2 A I Y QCCD LDP QAR VAIK SLT E R L YVG 
6241 GGCAATCTAC CAATGTTGTG ACCTCGACCC CCAAGCCCGC GTGGCCATCA AGTCCCTCAC CGAGAGGCTT TATGTTGGGG 
CCGTTAGATG GTTACAACAC TGGAGCTGGG GGTTCGGGCG CACCGGTAGT TCAGGGAGTG GCTCTCCGAA ATACAACCCC 



+2GPLT NSR GENC GYR RCR ASGV LTT SCG 
6321 GCCCTCTTAC CAATTCAAGG GGGGAGAACT GCGGCTATCG CAGGTGCCGC GCGAGCGGCG TACTGACAAC TAGCTGTGGT 
CGGGAGAATG GTTAAGTTCC CCCCTCTTGA CGCCGATAGC GTCCACGGCG CGCTCGCCGC ATGACTGTTG ATCGACACCA 



+2NTLT CYI KAR AACR A A G LQD CTML VCG 
6401 AACACCCTCA CTTGCTACAT CAAGGCCCGG GCAGCCTGTC GAGCCGCAGG GCTCCAGGAC TGCACCATGC TCGTGTGTGG 
TTGTGGGAGT GAACGATGTA GTTCCGGGCC CGTCGGACAG CTCGGCGTCC CGAGGTCCTG ACGTGGTACG AGCACACACC 



+2 DDL VVIC ESA GVQ EDAA SLR AFT EAM 
6481 CGACGACTTA GTCGTTATCT GTGAAAGCGC GGGGGTCCAG GAGGACGCGG CGAGCCTGAG AGCCTTCACG GAGGCTATGA 
GCTGCTGAAT CAGCAATAGA CACTTTCGCG CCCCCAGGTC CTCCTGCGCC GCTCGGACTC TCGGAAGTGC CTCCGATACT 



+2TRYS APP GDPP QPE YDL ELIT SCS S N V 
6561 CCAGGTACTC CGCCCCCCCT GGGGACCCCC CACAACCAGA ATACGACTTG GAGCTCATAA CATCATGCTC CTCCAACGTG 
GGTCCATGAG GCGGGGGGGA CCCCTGGGGG GTGTTGGTCT TATGCTGAAC CTCGAGTATT GTAGTACGAG GAGGTTGCAC 



+ 2SVAH DGA GKR VYYL TRD PTT PLAR A A W 
6641 TCAGTCGCCC ACGACGGCGC TGGAAAGAGG GTCTACTACC TCACCCGTGA CCCTACAACC CCCCTCGCGA GAGCTGCGTG 
AGTCAGCGGG TGCTGCCGCG ACCTTTCTCC CAGATGATGG AGTGGGCACT GGGATGTTGG GGGGAGCGCT CTCGACGCAC 



+ 2 ETA RHTP VNS WLG N I I M FAP TLW ARM 
6721 GGAGACAGCA AGACACACTC CAGTCAATTC CTGGCTAGGC AACATAATCA TGTTTGCCCC CACACTGTGG GCGAGGATGA 
CCTCTGTCGT TCTGTGTGAG GTCAGTTAAG GACCGATCCG TTGTATTAGT ACAAACGGGG GTGTGACACC CGCTCCTACT 



+ 2ILMT HFF SVLI ARD QLE QALD CEI YGA 
6301 TACTGATGAC CCATTTCTTT AGCGTCCTTA TAGCCAGGGA CCAGCTTGAA CAGGCCCTCG ATTGCGAGAT CTACGGGGCC 
ATGACTACTG GGTAAAGAAA TCGCAGGAAT ATCGGTCCCT GGTCGAACTT GTCCGGGAGC TAACGCTCTA GATGCCCCGG 



+ 2CYSI EPL DLP PIIQ RLH GLS AFSL HSY 
6881 TGCTACTCCA TAGAACCACT GGATCTACCT CCAATCATTC AAAGACTCCA TGGCCTCAGC GCATTTTCAC TCCACAGTTA 
ACGATGAGGT ATCTTGGTGA CCTAGATGGA GGTTAGTAAG TTTCTGAGGT ACCGGAGTCG CGTAAAAGTG AGGTGTCAAT 



+2 SPG EINR V A A CLR KLGV PPL RAW RHR 
6961 CTCTCCAGGT GAAATCAATA GGGTGGCCGC ATGCCTCAGA AAACTTGGGG TACCGCCCTT GCGAGCTTGG AGACACCGGG 
GAGAGGTCCA CTTTAGTTAT CCCACCGGCG TACGGAGTCT TTTGAACCCC ATGGCGGGAA CGCTCGAACC TCTGTGGCCC 



+ 2ARSV RAR LLAR GGR A A I CGKY LFN WAV 
7041 CCCGGAGCGT CCGCGCTAGG CTTCTGGCCA GAGGAGGCAG GGCTGCCATA TGTGGCAAGT ACCTCTTCAA CTGGGCAGTA 
GGGCCTCGCA GGCGCGATCC GAAGACCGGT CTCCTCCGTC CCGACGGTAT ACACCGTTCA TGGAGAAGTT GACCCGTCAT 
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+ 2RTKL KLT P I A A A G Q LDL SGW F T A G Y S G 
7121 AGAACAAAGC TCAAACTCAC TCCAATAGCG GCCGCTGGCC AGCTGGACTT GTCCGGCTGG TTCACGGCTG GCTACAGCGG 
TCTTGTTTCG AGTTTGAGTG AGGTTATCGC CGGCGACCGG TCGACCTGAA CAGGCCGACC AAGTGCCGAC CGATGTCGCC 



+2 GDI Y-HSV SHA RPR WIWF CLL LLA A G V 
7 201 GGGAGACATT TATCACAGCG TGTCTCATGC CCGGCCCCGC TGGATCTGGT TTTGCCTACT CCTGCTTGCT GCAGGGGTAG 
CCCTCTGTAA ATAGTGTCGC ACAGAGTACG GGCCGGGGCQ ACCTAGACCA AAACGGATGA GGACGAACGA CGTCCCCATC 



+ 2GIYL LPN R 
7281 GCATCTACCT CCTCCCCAAC CGATGAAGGT TGGGGTAAAC ACTCCGGCCT AAAAAAAAAA AAAAATCTAG AAAGGCGCGC 
CGTAGATGGA GGAGGGGTTG GCTACTTCCA ACCCCATTTG TGAGGCCGGA TTTTTTTTTT TTTTTAGATC TTTCCGCGCG 



BamHI Mlul 



7361 


CAAGATATCA 
GTTCTATAGT 


AGGATCCACT 
TCCTAGGTGA 


ACGCGTTAGA 
TGCGCAATCT 


GCTCGCTGAT 
CGAGCGACTA 


CAGCCTCGAC 
GTCGGAGCTG 


TGTGCCTTCT 
ACACGGAAGA 


AGTTGCCAGC 
TCAACGGTCG 


CATCTGTTGT 
GTAGACAACA 


7441 


TTGCCCCTCC 
AACGGGGAGG 


CCCGTGCCTT 
GGGCACGGAA 


CCTTGACCCT 
GGAACTGGGA 


GGAAGGTGCC 
CCTTCCACGG 


ACTCCCACTG 
TGAGGGTGAC 


TCCTTTCCTA 
AGGAAAGGAT 


ATAAAATGAG 
TATTTTACTC 


GAAATTGCAT 
CTTTAACGTA 


7521 


CGCATTGTCT 
GCGTAACAGA 


GAGTAGGTGT 
CTCATCCACA 


CATTCTATTC 
GTAAGATAAG 


TGGGGGGTGG 
ACCCCCCACC 


GGTGGGGCAG 
CCACCCCGTC 


GACAGCAAGG 
CTGTCGTTCC 


GGGAGGATTG 
CCCTCCTAAC 


GGAAGACAAT 
CCTTCTGTTA 


7601 


AGCAGGCATG 
TCGTCCGTAC 


CTGGGGAGCT 
GACCCCTCGA 


CTTCCGCTTC 
GAAGGCGAAG 


CTCGCTCACT 
GAGCGAGTGA 


GACTCGCTGC 
CTGAGCGACG 


GCTCGGTCGT 
CGAGCCAGCA 


TCGGCTGCGG 
AGCCGACGCC 


CGAGCGGTAT 
GCTCGCCATA 


7681 


CAGCTCACTC 
GTCGAGTGAG 


AAAGGCGGTA 
TTTCCGCCAT 


ATACGGTTAT 
TATGCCAATA 


CCACAGAATC 
GGTGTCTTAG 


AGGGGATAAC 
TCCCCTATTG 


GCAGGAAAGA 
CGTCCTTTCT 


ACATGTGAGC 
TGTACACTCG 


AAAAGGCCAG 
TTTTCCGGTC 


7761 


CAAAAGGCCA 
GTTTTCCGGT 


GGAACCGTAA 
CCTTGGCATT 


AAAGGCCGCG 
TTTCCGGCGC 


TTGCTGGCGT 
AACGACCGCA 


TTTTCCATAG 
AAAAGGTATC 


GCTCCGCCCC 
CGAGGCGGGG 


CCTGACGAGC 
GGACTGCTCG 


ATCACAAAAA 
TAGTGTTTTT 


7841 


TCGACGCTCA 
AGCTGCGAGT 


AGTCAGAGGT 
TCAGTCTCCA 


GGCGAAACCC 
CCGCTTTGGG 


GACAGGACTA 
CTGTCCTGAT 


TAAAGATACC 
ATTTCTATGG 


AGGCGTTTCC 
TCCGCAAAGG 


CCCTGGAAGC 
GGGACCTTCG 


TCCCTCGTGC 
AGGGAGCACG 


7921 


GCTCTCCTGT 
CGAGAGGACA 


TCCGACCCTG 
AGGCTGGGAC 


CCGCTTACCG 
GGCGAATGGC 


GATACCTGTC 
CTATGGACAG 


CGCCTTTCTC 
GCGGAAAGAG 


CCTTCGGGAA 
GGAAGCCCTT 


GCGTGGCGCT 
CGCACCGCGA 


TTCTCAATGC 
AAGAGTTACG 


8001 


TCACGCTGTA 
AGTGCGACAT 


GGTATCTCAG 
CCATAGAGTC 


TTCGGTGTAG 
AAGCCACATC 


GTCGTTCGCT 
CAGCAAGCGA 


CCAAGCTGGG 
GGTTCGACCC 


CTGTGTGCAC 
GACACACGTG 


GAACCCCCCG 
CTTGGGGGGC 


TTCAGCCCGA 
AAGTCGGGCT 


8081 


CCGCTGCGCC 
GGCGACGCGG 


TTATCCGGTA 
AATAGGCCAT 


ACTATCGTCT 
TGATAGCAGA 


TGAGTCCAAC 
ACTCAGGTTG 


CCGGTAAGAC 
GGCCATTCTG 


ACGACTTATC 
TGCTGAATAG 


GCCACTGGCA 
CGGTGACCGT 


GCAGCCACTG 
CGTCGGTGAC 


8161 


GTAACAGGAT 
CATTGTCCTA 


TAGCAGAGCG 
ATCGTCTCGC 


AGGTATGTAG 
TCCATACATC 


GCGGTGCTAC 
CGCCACGATG 


AGAGTTCTTG 
TCTCAAGAAC 


AAGTGGTGGC 
TTCACCACCG 


CTAACTACGG 
GATTGATGCC 


CTACACTAGA 
GATGTGATCT 


8241 


AGGACAGTAT 
TCCTGTCATA 


TTGGTATCTG 
AACCATAGAC 


CGCTCTGCTG 
GCGAGACGAC 


AAGCCAGTTA 
TTCGGTCAAT 


CCTTCGGAAA 
GGAAGCCTTT 


AAGAGTTGGT 
TTCTCAACCA 


AGCTCTTGAT 
TCGAGAACTA 


CCGGCAAACA 
GGCCGTTTGT 


8321 


AACCACCGCT 
TTGGTGGCGA 


GGTAGCGGTG 
CCATCGCCAC 


GTTTTTTTGT 
CAAAAAAACA 


TTGCAAGCAG 
AACGTTCGTC 


CAGATTACGC 
GTCTAATGCG 


GCAGAAAAAA AGGATCTCAA 
CGTCTTTTTT TCCTAGAGTT 


GAAGATCCTT 
CTTCTAGGAA 


8401 


TGATCTTTTC 
ACTAGAAAAG 


TACGGGGTCT 
ATGCCCCAGA 


GACGCTCAGT 
CTGCGAGTCA 


GGAACGAAAA 
CCTTGCTTTT 


CTCACGTTAA 
GAGTGCAATT 


GGGATTTTGG 
CCCTAAAACC 


TCATGAGATT 
AGTACTCTAA 


ATCAAAAAGG 
TAGTTTTTCC 
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8481 ATCTTCACCT AGATCCTTTT AAATTAAAAA TGAAGTTTTA AATCAATCTA AAGTATATAT GAGTAAACTT GGTCTGACAG 
TAGAAGTGGA TCTAGGAAAA TTTAATTTTT ACTTCAAAAT TTAGTTAGAT TTCATATATA CTCATTTGAA CCAGACTGTC 

8561 TTACCAATGC TTAATCAGTG AGGCACCTAT CTCAGCGATC TGTCTATTTC GTTCATCCAT AGTTGCCTGA CTCCCCGTCG 
AATGGTTACG AATTAGTCAC TCCGTGGATA GAGTCGCTAG ACAGATAAAG CAAGTAGGTA TCAACGGACT GAGGGGCAGC 

8641 TGTAGATAAC TACGATACGG GAGGGCTTAC CATCTGGCCC CAGTGCTGCA ATGATACCGC GAGACCCACG CTCACCGGCT 
ACATCTATTG ATGCTATGCC CTCCCGAATG GTAGACCGGG GTCACGACGT TACTATGGCG CTCTGGGTGC GAGTGGCCGA 

8721 CCAGATTTAT CAGCAATAAA CCAGCCAGCC GGPJiGGGCCG AGCGCAGAAG TGGTCCTGCA ACTTTATCCG CCTCCATCCA 
GGTCTAAATA GTCGTTATTT GGTCGGTCGG CCTTCCCGGC TCGCGTCTTC ACCAGGACGT TGAAATAGGC GGAGGTAGGT 

8801 GTCTATTAAT TGTTGCCGGG AAGCTAGAGT AAGTAGTTCG CCAGTTAATA GTTTGCGCAA CGTTGTTGCC ATTGCTACAG 
CAGATAATTA ACAACGGCCC TTCGATCTCA TTCATCAAGC GGTCAATTAT CAAACGCGTT GCAACAACGG TAACGATGTC 

8 881 GCATCGTGGT GTCACGCTCG TCGTTTGGTA TGGCTTCATT CAGCTCCGGT TCCCAACGAT CAAGGCGAGT TACATGATCC 
CGTAGCACCA CAGTGCGAGC AGCAAACCAT ACCGAAGTAA GTCGAGGCCA AGGGTTGCTA GTTCCGCTCA ATGTACTAGG 

8 961 CCCATGTTGT GCAAAAAAGC GGTTAGCTCC TTCGGTCCTC CGATCGTTGT CAGAAGTAAG TTGGCCGCAG TGTTATCACT 
GGG TACAACA CGTTTTTTCG CCAATCGAGG AAGCCAGGAG GCTAGCAACA GTCTTCATTC AACCGGCGTC ACAATAGTGA 

9041 CATGGTTATG GCAGCACTGC ATAATTCTCT TACTGTCATG CCATCCGTAA GATGCTTTTC TGTGACTGGT GAGTACTCAA 
GTACCAATAC CGTCGTGACG TATTAAGAGA ATGACAGTAC GGTAGGCATT CTACGAAAAG ACACTGACCA CTCATGAGTT 

9121 CCAAGTCATT CTGAGAATAG TGTATGCGGC GACCGAGTTG CTCTTGCCCG GCGTCAATAC GGGATAATAC CGCGCCACAT 
GGTTCAGTAA GACTCTTATC ACATACGCCG CTGGCTCAAC GAGAACGGGC CGCAGTTATG CCCTATTATG GCGCGGTGTA 

9201 AGCAGAACTT TAAAAGTGCT CATCATTGGA AAACGTTCTT CGGGGCGAAA ACTCTCAAGG ATCTTACCGC TGTTGAGATC 
TCGTCTTGAA ATTTTCACGA GTAGTAACCT TTTGCAAGAA GCCCCGCTTT TGAGAGTTCC TAGAATGGCG ACAACTCTAG 

9281 CAGTTCGATG TAACCCACTC GTGCACCCAA CTGATCTTCA GCATCTTTTA CTTTCACCAG CGTTTCTGGG TGAGCAAAAA 
GTCAAGCTAC ATTGGGTGAG CACGTGGGTT GACTAGAAGT CGTAGAAAAT GAAAGTGGTC GCAAAGACCC ACTCGTTTTT 

9361 CAGGAAGGCA AAATGCCGCA AAAAAGGGAA TAAGGGCGAC ACGGAAATGT TGAATACTCA TACTCTTCCT TTTTCAATAT 
GTCCTTCCGT TTTACGGCGT TTTTTCCCTT ATTCCCGCTG TGCCTTTACA ACTTATGAGT ATGAGAAGGA AAAAGTTATA 

94 41 TATTGAAGCA TTTATCAGGG TTATTGTCTC ATGAGCGGAT ACATATTTGA ATGTATTTAG AAAAATAAAC AAATAGGGGT 
ATAACTTCGT AAATAGTCCC AATAACAGAG TACTCGCCTA TGTATAAACT TACATAAATC TTTTTATTTG TTTATCCCCA 

9521 TCCGCGCACA TTTCCCCGAA AAGTGCCACC TGACGTCTAA GAAACCATTA TTATCATGAC ATTAACCTAT AAAAATAGGC 
AGGCGCGTGT AAAGGGGCTT TTCACGGTGG ACTGCAGATT CTTTGGTAAT AATAGTACTG TAATTGGATA TTTTTATCCG 



9601 GTATCACGAG GCCCTTTCGT C 
CATAGTGCTC CGGGAAAGCA G 
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1 


TCGCGCGTTT 
AGCGCGCAAA 


CGGTGATGAC 
GCCACTACTG 


GGTGAAAACC 
CCACTTTTGG 


TCTGACACAT 
AGACTGTGTA 


GCAGCTCCCG 
CGTCGAGGGC 


GAGACGGTCA 
CTCTGCCAGT 


CAGCTTGTCT 
GTCGAACAGA 


GTAAGCGGAT 
CATTCGCCTA 


81 


GCCGGGAdCA GACAAGCCCG TCAGGGCGCG TCAGCGGGTG TTGGCGGGTG TCGGGGCTGG CTTAACTATG CGGCATCAGA 
CGGCCCTCGT CTGTTCGGGC AGTCCCGCGC AGTCGCCCAC AACCGCCCAC AGCCCCGACC GAATTGATAC GCCGTAGTCT 


161 


GCAGATTGTA 

PCTPTA APAT 


CTGAGAGTGC 

OrACTCTCACG 

X \*> A \*t\\*\3 


ACCATATGAA 
TGGTATACTT 


GCTTTTTGCA 
CGAAAAACGT 


AAAGCCTAGG 
TTTCGGATCC 


CCTCCAAAAA 

GGAGGTTTTT 
wwnuu x x x x x 


AGCCTCCTCA 
TCGGAGGAGT 


CTACTTCTGG 
GATGAAGACC 


241 


AATAGCTCAG 
1 1 Ax UuAtjx 


AGGCCGAGGC 

x Ltbbt 1 L-L-u 


GGCCTCGGCC 


TCTGCATAAA 

AvjAUoiAl 1 I 


TAAAAAAAAT 

Al ill ill 1A 


TAGTCAGCCA 

ATGZiCTPCGT 


TGGGGCGGAG 


AATGGGCGGA 

TTArPPGPCT 

X 1 ttL^VjUU^ 1 


321 


ACTGGGCGGG 

1 IjAv^L-UV^V*. 


GAGGGAATTA 

Li^Ll 1AAX 


TTGGCTATTG 
a a ppc at a ap 


GCCATTGCAT 


ACGTTGTATC 

TGPAAPATAG 


TATATCATAA 

ATATAGTATT 

C\ L C\ X £\\3 X A X X 


TATGTACATT 
ATAGATGTAA 


TATATTGGCT 
ATATAAGPGA 


401 


CATGTCCAAT 

tj I ALAbu x x A 


ATGACCGCCA 
1AU xuIjUvju x 


TGTTGACATT 
apa aptcta a 


GATTATTGAC 
pta ATAarTf: 


TAGTTATTAA 

ATPAATAATT 


TAGTAATCAA 

ATPATTAGTT 
n. x x x nvj x x 


TTACGGGGTC 

AATPPPPPAP 
nn x vjvv\j\jfiv3 


ATTAGTTCAT 
TAATCAAGTA 


481 


AGCCCATATA 
TCGGGTATAT 


TGGAGTTCCG 
ACCTCAAGGC 


CGTTACATAA 
GCAAIGTAI 1 


CTTACGGTAA 

PA ATPPPATT 
uM 1 GUUA 1 x 


ATGGCCCGCC 


TGGCTGACCG 
appcaptccp 


CCCAACGACC 

GGGTTP.PTGG 


CCCGCCCATT 

GCGPP.nCT A A 


561 


GACGTCAATA 
CTGCAGTTAT 


ATGACGTATG 
TACTGCATAC 


TTCCCATAGT 
AAGGGTATCA 


AACGCCAATA 

i rGUGGI 1 Ai 


GGGACTTTCC 

PPPTPTA A 7APP 

L-U^ 1 uAAAv3<a 


ATTGACGTCA 

TA APTPPACT 
X AAL, X bL.Hu X 


ATGGGTGGAG 

TAPPPAPPTP 


TATTTACGGT 

ATAA ATCPPA 


641 


AAACTGCCCA 
TTTGACGGGT 


CTTGGCAGTA 
GAACCGTCAT 


CATCAAGTGT 

GTAGTTCACa 


ATCATATGCC 

t a /*• t 7a t a p r* p 
TAG lA 1 ACGG 


AAGTCCGCCC 

I x vnu«L.uo« 


CCTATTGACG 

PCATA APTCP 


TCAATGACGG 

A^TTAPT^PP 
riw X 1 1 w^-'w 


TAAATGGCCC 

ATTTAPPCCC 
rVX X X nULuuo 


721 


GCCTGGCATT 
CGGACCGTAA 


ATGCCCAGTA 
TACGGGTCAT 


CATGACCTTA 
GTACTGGAAT 


CGGGACTTTC 
GCCCTGAAAG 


CTACTTGGCA 

p atpa apppt 
GA 1 GAA^uG I 


GTACATCTAC 

P ATPTACATP. 
L- A lul A<JA X \j 


GTATTAGTCA 

P ATA ATP ACT 
i AA X L AVj X 


TCGCTATTAC 

ACPCAT A ATC 


801 


CATGGTGATG 
GTACCACTAC 


CGGTTTTGGC 
GCCAAAACCG 


AGTACACCAA 
TCATGTGGTT 


TGGGCGTGGA 
ACCC^GLAGL 1 


TAGCGGTTTG 
A 1 LGCLAAAG 


ACTCACGGGG 
1 0 Ao x uttL-L. 


ATTTCCAAGT 

TA A ACCTTPA 
X HAA\J(jr i i Urt 


CTCCACCCCA 
GAGGTGGGGT 


881 


TTGACGTCAA 
AACTGCAGTT 


TGGGAGTTTG 
ACCCTCAAAC 


TTTTGGCACC 
AAAACCGTGG 


AAAATCAACG 

111 iAGi JLGO 


GGACTTTCCA 

PPTP A A A PPT 


AAATGTCGTA 
TTTACAGCAT 


ATAACCCCGC 
TATTGGGGCG 


CCCGTTGACG 
GGGCAACTGC 


961 


CAAATGGGCG 
GTTTACCCGC 


GTAGGCGTGT 
CATCCGCACA 


ACGGTGGGAG 
TGCCACCCTC 


GTCTATATAA 
CAGATATATT 


GCAGAGCTCG 
CGTCTCGAGC 


TTTAGTGAAC 
AAATCACTTG 


CGTCAGATCG 
GCAGTCTAGC 


CCTGGAGACG 
GGACCTCTGC 


1041 


CCATCCACGC 
GGTAGGTGCG 


TGTTTTGACC 
ACAAAACTGG 


TCCATAGAAG 
AGGTATCTTC 


ACACCGGGAC 
TGTGGCCCTG 


CGATCCAGCC 
GCTAGGTCGG 


TCCGCGGCCG 
AGGCGCCGGC 


GGAACGGTGC 
CCTTGCCACG 


ATTGGAACGC 
TAACCTTGCG 


1121 


GGATTCCCCG 
CCTAAGGGGC 


TGCCAAGAGT 
ACGGTTCTCA 


GACGTAAGTA 
CTGCATTCAT 


CCGCCTATAG 
GGCGGATATC 


ACTCTATAGG 
TGAGATATCC 


CACACCCCTT 
GTGTGGGGAA 


TGGCTCTTAT 
ACCGAGAATA 


GCATGCTATA 
CGTACGATAT 


1201 


CTGTTTTTGG 
GACAAAAACC 


CTTGGGGCCT 
GAACCCCGGA 


ATACACCCCC 
TATGTGGGGG 


GCTCCTTATG 
CGAGGAATAC 


CTATAGGTGA 
GATATCCACT 


TGGTATAGCT TAGCCTATAG 
ACCATATCGA ATCGGATATC 


GTGTGGGTTA 
CACACCCAAT 


1281 


TTGACCATTA 
AACTGGTAAT 


TTGACCACTC 
AACTGGTGAG 


CCCTATTGGT 
GGGATAACCA 


GACGATACTT 
CTGCTATGAA 


TCCATTACTA 
AGGTAATGAT 


ATCCATAACA 
TAGGTATTGT 


TGGCTCTTTG 
ACCGAGAAAC 


CCACAACTAT 
GGTGTTGATA 



1361 CTCTATTGGC TATATGCCAA TACTCTGTCC TTCAGAGACT GACACGGACT CTGTATTTTT ACAGGATGGG GTCCATTTAT 
GAGATAACCG ATATACGGTT ATGAGACAGG AAGTCTCTGA CTGTGCCTGA GACATAAAAA TGTCCTACCC CAGGTAAATA 



1441 T ATT T AC AAA TTCACATATA CAACAACGCC GTCCCCCGTG CCCGCAGTTT TTATTAAACA TAGCGTGGGA TCTCCGACAT 
ATAAATGTTT AAGTGTATAT GTTGTTGCGG CAGGGGGCAC GGGCGTCAAA AATAATTTGT ATCGCACCCT AGAGGCTGTA 
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1521 CTCGGGTACG TGTTCCGGAC ATGGGCTCTT CTCCGGTAGC GGCGGAGCTT CCACATCCGA GCCCTGGTCC CATCCGTCCA 
GAGCCCATGC ACAAGGCCTG TACCCGAGAA GAGGCCATCG CCGCCTCGAA GGTGTAGGCT CGGGACCAGG GTAGGCAGGT 



1601 GCGGCTCA'TG GTCGCTCGGC AGCTCCTTGC TCCTAACAGT GGAGGCCAGA CTTAGGCACA GCACAATGCC CACCACCACC 
CGCCGAGTAC CAGCGAGCCG TCGAGGAACG AGGATTGTCA CCTCCGGTCT GAATCCGTGT CGTGTTACGG GTGGTGGTGG 



1681 AGTGTGCCGC ACAAGGCCGT GGCGGTAGGG TATGTGTCTG AAAATGAGCT CGGAGATTGG GCTCGCACCT GGACGCAGAT 
TCACACGGCG TGTTCCGGCA CCGCCATCCC ATACACAGAC TTTTACTCGA GCCTCTAACC CGAGCGTGGA CCTGCGTCTA 



17 61 GGAAGACTTA AGGCAGCGGC AGAAGAAGAT GCAGGCAGCT GAGTTGTTGT ATTCTGATAA GAGTCAGAGG TAACTCCCGT 
CCTTCTGAAT TCCGTCGCCG TCTTCTTCTA CGTCCGTCGA CTCAACAACA TAAGACTATT CTCAGTCTCC ATTGAGGGCA 



1841 TGCGGTGCTG TTAACGGTGG AGGGCAGTGT AGTCTGAGCA GTACTCGTTG CTGCCGCGCG CGCCACCAGA CATAATAGCT 
ACGCCACGAC AATTGCCACC TCCCGTCACA TCAGACTCGT CATGAGCAAC GACGGCGCGC GCGGTGGTCT GTATTATCGA 



EcoRI 



1921 GACAGACTAA CAGACTGTTC CTTTCCATGG GTCTTTTCTG CAGTCACCGT CGTCGACCTA AGAATTCAGA CTCGAGCAAG 
CTGTCTGATT GTCTGACAAG GAAAGGTACC CAGAAAAGAC GTCAGTGGCA GCAGCTGGAT TCTTAAGTCT GAGCTCGTTC 



Xbal BamHI Mlul 



2001 TCTAGAAAGG CGCGCCAAGA TATCAAGGAT CCACTACGCG TTAGAGCTCG CTGATCAGCC TCGACTGTGC CTTCTAGTTG 
AGATCTTTCC GCGCGGTTCT ATAGTTCCTA GGTGATGCGC AATCTCGAGC GACTAGTCGG AGCTGACACG GAAGATCAAC 



2081 CCAGCCATCT GTTGTTTGCC CCTCCCCCGT GCCTTCCTTG ACCCTGGAAG GTGCCACTCC CACTGTCCTT TCCTAATAAA 
GGTCGGTAGA CAACAAACGG GGAGGGGGCA CGGAAGGAAC TGGGACCTTC CACGGTGAGG GTGACAGGAA AGGATTATTT 



2161 ATGAGGAAAT TGCATCGCAT TGTCTGAGTA GGTGTCATTC TATTCTGGGG GGT GGGGTGG GGCAGGACAG CAAGGGGGAG 
TACTCCTTTA ACGTAGCGTA ACAGACTCAT CCACAGTAAG ATAAGACCCC CCACCCCACC CCGTCCTGTC GTTCCCCCTC 



2241 GATTGGGAAG ACAATAGCAG GCATGCTGGG GAGCTCTTCC GCTTCCTCGC TCACTGACTC GCTGCGCTCG GTCGTTCGGC 
CTAACCCTTC TGTTATCGTC CGTACGACCC CTCGAGAAGG CGAAGGAGCG AC3TGACTGAG CGACGCGAGC CAGCAAGCCG 



2321 TGCGGCGAGC GGTATCAGCT CACTCAAAGG CGGTAATACG GTTATCCACA GAATCAGGGG ATAACGCAGG AAAGAACATG 
ACGCCGCTCG CCATAGTCGA GTGAGTTTCC GCCATTATGC CAATAGGTGT CTTAGTCCCC TATTGCGTCC TTTCTTGTAC 



2401 TGAGCAAAAG GCCAGCAAAA GGCCAGGAAC CGTAAAAAGG CCGCGTTGCT GGCGTTTTTC CATAGGCTCC GCCCCCCTGA 
ACTCGTTTTC CGGTCGTTTT CCGGTCCTTG GCATTTTTCC GGCGCAACGA CCGCAAAAAG GTATCCGAGG CGGGGGGACT 



24 81 CGAGCATCAC AAAAATCGAC GCTCAAGTCA GAGGXGGCGA AACCCGACAG GACTATAAAG ATACCAGGCG TTTCCCCCTG 
GCTCGTAGTG TTTTTAGCTG CGAGTTCAGT CTCCACCGCT TTGGGCTGTC CTGATATTTC TATGGTCCGC AAAGGGGGAC 



2561 GAAGCTCCCT CGTGCGCTCT CCTGTTCCGA CCCTGCCGCT TACCGGATAC CTGTCCGCCT TTCTCCCTTC GGGAAGCGTG 
CTTCGAGGGA GCACGCGAGA GGACAAGGCT GGGACGGCGA ATGGCCTATG GACAGGCGGA AAGAGGGAAG CCCTTCGCAC 



2641 GCGCTTTCTC AATGCTCACG CTGTAGGTAT CTCAGTTCGG TGTAGGTCGT TCGCTCCAAG CTGGGCTGTG TGCACGAACC 
CGCGAAAGAG TTACGAGTGC GACATCCATA GAGTCAAGCC ACATCCAGCA AGCGAGGTTC GACCCGACAC ACGTGCTTGG 



2721 CCCCGTTCAG CCCGACCGCT GCGCCTTATC CGGTAACTAT CGTCTTGAGT CCAACCCGGT AAGACACGAC TTATCGCCAC 
GGGGCAAGTC GGGCTGGCGA CGCGGAATAG GCCATTGATA GCAGAACTCA GGTTGGGCCA TTCTGTGCTG AATAGCGGTG 



2801 TGGCAGCAGC CACTGGTAAC AGGATTAGCA GAGCGAGGTA TGTAGGCGGT GCTACAGAGT TCTTGAAGTG GTGGCCTAAC 
ACCGTCGTCG GTGACCAXTG TCCTAATCGT CTCGCTCCAT ACATCCGCCA CGATGTCTCA AGAACTTCAC CACCGGATTG 



2881 TACGGCTACA CTAGAAGGAC AGTATTTGGT ATCTGCGCTC TGCTGAAGCC AGTTACCTTC GGAAAAAGAG TTGGTAGCTC 
ATGCCGATGT GATCTTCCTG TCATAAACCA TAGACGCGAG ACGACTTCGG TCAATGGAAG CCTTTTTCTC AACCATCGAG 
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2961 TTGATCCGGC AAACAAACCA CCGCTGGTAG CGGTGGTTTT TTTGTTTGCA AGCAGCAGAT TACGCGCAGA AAAAAAGGAT 
AACTAGGCCG TTTGTTTGGT GGCGACCATC GCCACCAAAA AAACAAACGT TCGTCGTCTA ATGCGCGTCT TTTTTTCCTA 



3041 CTCAAGAAGA TCCTTTGATC TTTTCTACGG GGTCTGACGC TCAGTGGAAC GAAAACTCAC GTTAAGGGAT TTTGGTCATG 
GAGTTCTTCT AGGAAACTAG AAAAGATGCC CCAGACTGCG AGTCACCTTG CTTTTGAGTG CAATTCCCTA AAACCAGTAC 



3121 AGATTATCAA AAAGGATCTT CACCTAGATC CTTTTAAATT AAAAATGAAG TTTTAAATCA ATCTAAAGTA TATATGAGTA 
TCTAATAGTT TTTCCTAGAA GTGGATCTAG GAAAATTTAA TTTTTACTTC AAAATTTAGT TAGATTTCAT ATATACTCAT 



3201 AACTTGGTCT GACAGTTACC AATGCTTAAT CAGTGAGGCA CCTATCTCAG CGATCTGTCT ATTTCGTTCA TCCATAGTTG 
TTGAACCAGA CTGTCAATGG TTACGAATTA GTCACTCCGT GGATAGAGTC GCTAGACAGA TAAAGCAAGT AGGTATCAAC 



3281 CCTGACTCCC CGTCGTGTAG ATAACTACGA TACGGGAGGG CTTACCATCT GGCCCCAGTG CTGCAATGAT ACCGCGAGAC 
GGACTGAGGG GCAGCACATC TATTGATGCT ATGCCCTCCC GAATGGTAGA CCGGGGTCAC GACGTTACTA TGGCGCTCTG 



3361 CCACGCTCAC CGGCTCCAGA TTTATCAGCA ATAAACCAGC CAGCCGGAAG GGCCGAGCGC AGAAGTGGTC CTGCAACTTT 
GGTGCGAGTG GCCGAGGTCT AAATAGTCGT TATTTGGTCG GTCGGCCTTC CCGGCUCGCG TCTTCACCAG GACGTTGAAA 



3441 ATCCGCCTCC ATCCAGTCTA TTAATTGTTG CCGGGAAGCT AGAGTAAGTA GTTCGCCAGT TAATAGTTTG CGCAACGTTG 
TAGGCGGAGG TAGGTCAGAT AATTAACAAC GGCCCTTCGA TCTCATTCAT CAAGCGGTCA ATTATCAAAC GCGTTGCAAC 



3521 TTGCCATTGC TACAGGCATC GTGGTGTCAC GCTCGTCGTT TGGTATGGCT TCATTCAGCT CCGGTTCCCA ACGATCAAGG 
AACGGTAACG ATGTCCGTAG CACCACAGTG CGAGCAGCAA ACCATACCGA AGTAAGTCGA GGCCAAGGGT TGCTAGTTCC 



3601 CGAGTTACAT GATCCCCCAT GTTGTGCAAA AAAGCGGTTA GCTCCTTCGG TCCTCCGATC GTTGTCAGAA GTAAGTTGGC 
GCTCAATGTA CTAGGGGGTA CAACACGTTT TTTCGCCAAT CGAGGAAGCC AGGAGGCTAG CAACAGTCTT CATTCAACCG 



3681 CGCAGTGTTA TCACTCATGG TTATGGCAGC ACTGCATAAT TCTCTTACTG TCATGCCATC CGTAAGATGC TTTTCTGTGA 
GCGTCACAAT AGTGAGTACC AATACCGTCG TGACGTATTA AGAGAATGAC AGTACGGTAG GCATTCTACG AAAAGACACT 



37 61 CTGGTGAGTA CTCAACCAAG TCATTCTGAG AATAGTGTAT GCGGCG&CCG AGTTGCTCTT GCCCGGCGTC AATACGGGAT 
GACCACTCAT GAGTTGGTTC AGTAAGACTC TTATCACATA CGCCGCTGGC TCAACGAGAA CGGGCCGCAG TTATGCCCTA 



3841 AATACCGCGC CACATAGCAG AACTTTAAAA GTGCTCATCA TTGGAAAACG TTCTTCGGGG CGAAAACTCT CAAGGATCTT 
TTATGGCGCG GTGTATCGTC TTGAAATTTT CACGAGTAGT AACCTTTTGC AAGAAGCCCC GCTTTTGAGA GTTCCTAGAA 



3921 ACCGCTGTTG AGATCCAGTT CGATGTAACC CACTCGTGCA CCCAACTGAT CTTCAGCATC TTTTACTTTC ACCAGCGTTT 
TGGCGACAAC TCTAGGTCAA GCTACATTGG GTGAGCACGT GGGTTGACTA GAAGTCGTAG AAAATGAAAG TGGTCGCAAA 



4001 CTGGGTGAGC AAAAACAGGA AGGCAAAATG CCGCAAAAAA GGGAATAAGG GCGACACGGA AATGTTGAAT ACTCATACTC 
GACCCACTCG TTTTTGTCCT TCCGTTTTAC GGCGTTTTTT CCCTTATTCC CGCTGTGCCT TTACAACTTA TGAGTATGAG 



4 081 TTCCTTTTTC AATATTATTG AAGCATTTAT CAGGGTTATT GTCTCATGAG CGGATACATA TTTGAATGTA TTTAGAAAAA 
AAGGAAAAAG TTATAATAAC TTCGTAAATA GTCCCAATAA CAGAGTACTC GCCTATGTAT AAACTTACAT AAATCTTTTT 



4161 TAAACAAATA GGGGTTCCGC GCACATTTCC CCGAAAAGTG CCACCTGACG TCTAAGAAAC CATTATTATC ATGACATTAA 
ATTTGTTTAT CCCCAAGGCG CGTGTAAAGG GGCTTTTCAC GGTGGACTGC AGATTCTTTG GTAATAATAG TACTGTAATT 



4241 CCTATAAAAA TAGGCGTATC ACGAGGCCCT TTCGTC 
GGATATTTTT ATCCGCATAG TGCTCCGGGA AAGCAG 
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■J 


TCGCGCGTTT 
AGCGCGCAAA 


CGGTGATGAC 
GCCACTACTG 


CCACTTTTGG 


AGACTGTGTA 


GCAGCTCCCG 
CGTCGAGGGC 


51 


GAGACGGTCA 
CTCTGCCAGT 


CAGCTTGTCT 
GTCGAACAGA 


GTAAGCGGAT 
CATTCGCCTA 


GCCGGGAGCA 
CGGCCCTCGT 


GACAAGCCCCs 
CTGTTCGGGC 


101 


TCAGGGCGCG 
AGTCCCGCGC 


TCAGCGGGTG 
AGTCGCCCAC 


TTGGCGGGTG 
AACCGCCCAC 


TCGGGGCrbb 
AGCCCCGACC 


c r r aal i a i b 

GAATTGATAC 


151 


CGGCATCAGA 
GCCGTAGTCT 


GCAGATTGTA 
CGTCTAACAT 


CTGAGAGTGC 
GACTCTCACG 


ACCATATGAA 
TGGTATACTT 


GCTTTTTGCA 
CGAAAAACGT 




StuI 








201 


AAAGCCTAGG 
TTTCGGATCC 


CCTCCAAAAA 
GGAGGTTTTT 


AGCCTCCTCA 
TCGGAGGAGT 


CTACTTCTGG 
GATGAAGACC 


AATAGCTCAG 
TTATCGAGTC 


251 


AGGCCGAGGC 
TCCGGCTCCG 


GGCCTCGGCC 
CCGGAGCCGG 


TCTGCATAAA 
AGACGTATTT 


TAAAAAAAAT 
ATTTTTTTTA 


TAGTCAGCCA 
ATCAGTCGGT 


301 


TGGGGCGGAG 
ACCCCGCCTC 


AATGGGCGGA ACTGGGCGGG 
TTACCCGCCT TGACCCGCCC 


GAGGGAATTA 
CTCCCTTAAT 


TTGGCTATTG 
AACCGATAAC 


351 


GCCATTGCAT 
CGGTAACGTA 


ACGTTGTATC 
TGCAACATAG 


TATATCATAA 
ATATAGTATT 


TATGTACATT 
ATACATGTAA 


TATATTGGCT 
ATATAACCGA 


401 


CATGTCCAAT 
GTACAGGTTA 


ATGACCGCCA 
TACTGGCGGT 


TGTTGACATT 
ACAACTGTAA 


GATTATTGAC 
CTAATAACTG 


TAGTTATTAA 
ATCAATAATT 


451 


TAGTAATCAA 
ATCATTAGTT 


TTACGGGGTC 
AATGCCCCAG 


ATTAGTTCAT 
TAATCAAGTA 


AGCCCATATA 
TCGGGTATAT 


TGGAGTTCCG 
ACCTCAAGGC 


501 


CGTTACATAA 
GCAATGTATT 


CTTACGGTAA 
GAATGCCATT 


ATGGCCCGCC 
TACCGGGCGG 


TGGCTGACCG 
ACCGACTGGC 


CCCAACGACC 
GGGTTGCTGG 


551 


CCCGCCCATT 
GGGCGGGTAA 


GACGTCAATA 
CTGCAGTTAT 


ATGACGTATG 
TACTGCATAC 


TTCCCATAGT 
AAGGGTATCA 


AACGCCAATA 
TTGCGGTTAT 


601 


GGGACTTTCC 
CCCTGAAAGG 


ATTGACGTCA 
TAACTGCAGT 


ATGGGTGGAG 
TACCCACCTC 


TATTTACGGT 
ATAAATGCCA 


AAACTGCCCA 
TTTGACGGGT 


651 


CTTGGCAGTA 
GAACCGTCAT 


CATCAAGTGT 
GTAGTTCACA 


ATCATATGCC 
TAGTATACGG 


AAGTCCGCCC 
TTCAGGCGGG 


CCTATTGACG 
GGATAACTGC 


701 


TCAATGACGG 
AGTTACTGCC 


TAAATGGCCC 
ATTTACCGGG 


GCCTGGCATT 
CGGACCGTAA 


ATGCCCAGTA 
TACGGGTCAT 


CATGACCTTA 
GTACTGGAAT 


751 


CGGGACTTTC 
GCCCTGAAAG 


CTACTTGGCA 
GATGAACCGT 


GTACATCTAC 
CATGTAGATG 


GTATTAGTCA 
CATAATCAGT 


TCGCTATTAC 
AGCGATAATG 


801 


CATGGTGATG 
GTACCACTAC 


CGGTTTTGGC 
GCCAAAACCG 


AGTACACCAA 
TCATGTGGTT 


TGGGCGTGGA 
ACCCGCACCT 


TAGCGGTTTG 
ATCGCCAAAC 



851 ACTCACGGGG ATTTCCAAGT CTCCACCCCA TTGACGTCAA TGGGAGTTTG 
TGAGTGCCCC TAAAGGTTCA GAGGTGGGGT AACTGCAGTT ACCCTCAAAC 
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901 TTTTGGCACC AAAATCAACG GGACTTTCCA AAATGTCGTA ATAACCCCGC 
AAAACCGTGG TTTTAGTTGC CCTGAAAGGT TTTACAGCAT TATTGGGGCG 



951 CCCGTTGACG CAAATGGGCG GTAGGCGTGT ACGGTGGGAG GTCTATATAA 
GGGCAACTGC GTTTACCCGC CATCCGCACA TGCCACCCTC CAGATATATT 



1001 GCAGAGCTCG TTTAGTGAAC CGTCAGATCG CCTGGAGACG CCATCCACGC 
CGTCTCGAGC AAATCACTTG GCAGTCTAGC GGACCTCTGC GGTAGGTGCG 



1051 TGTTTTGACC TCCATAGAAG ACACCGGGAC CGATCCAGCC TCCGCGGCCG 
ACAAAACTGG AGGTATCTTC TGTGGCCCTG GCTAGGTCGG AGGCGCCGGC 



1101 GGAACGGTGC ATTGGAACGC GGATTCCCCG TGCCAAGAGT GACGTAAGTA 
CCTTGCCACG TAACCTTGCG CCTAAGGGGC ACGGTTCTCA CTGCATTCAT 



1151 CCGCCTATAG ACTCTATAGG CACACCCCTT TGGCTCTTAT GCATGCTATA 
GGCGGATATC TGAGATATCC GTGTGGGGAA ACCGAGAATA CGTACGATAT 



1201 CTGTTTTTGG CTTGGGGCCT ATACACCCCC GCTCCTTATG CTATAGGTGA 
GACAAAAACC GAACCCCGGA TATGTGGGGG CGAGGAATAC GATATCCACT 



1251 TGGTATAGCT TAGCCTATAG GTGTGGGTTA TTGACCATTA TTGACCACTC 
ACCATATCGA ATCGGATATC CACACCCAAT AACTGGTAAT AACTGGTGAG 



1301 CCCTATTGGT GACGATACTT TCCATTACTA ATCCATAACA TGGCTCTTTG 
GGGATAACCA CTGCTATGAA AGGTAATGAT TAGGTATTGT ACCGAGAAAC 



1351 CCACAACTAT CTCTATTGGC TATATGCCAA TACTCTGTCC TTCAGAGACT 
GGTGTTGATA GAGATAACCG ATATACGGTT ATGAGACAGG AAGTCTCTGA 



1401 GACACGGACT CTGTATTTTT ACAGGATGGG GTCCATTTAT TATTTACAAA 
CTGTGCCTGA GACATAAAAA TGTCCTACCC CAGGTAAATA ATAAATGTTT 



1451 TTCACATATA CAACAACGCC GTCCCCCGTG CCCGCAGTTT TTATTAAACA 
AAGTGTATAT GTTGTTGCGG CAGGGGGCAC GGGCGTCAAA AATAATTTGT 



1501 TAGCGTGGGA TCTCCGACAT CTCGGGTACG TGTTCCGGAC ATGGGCTCTT 
ATCGCACCCT AGAGGCTGTA GAGCCCATGC ACAAGGCCTG TACCCGAGAA 



1551 CTCCGGTAGC GGCGGAGCTT CCACATCCGA GCCCTGGTCC CATCCGTCCA 
GAGGCCATCG CCGCCTCGAA GGTGTAGGCT CGGGACCAGG GTAGGCAGGT 



1601 GCGGCTCATG GTCGCTCGGC AGCTCCTTGC TCCTAACAGT GGAGGCCAGA 
CGCCGAGTAC CAGCGAGCCG TCGAGGAACG AGGATTGTCA CCTCCGGTCT 



1651 CTTAGGCACA GCACAATGCC CACCACCACC AGTGTGCCGC ACAAGGCCGT 
GAATCCGTGT CGTGTTACGG GTGGTGGTGG TCACACGGCG TGTTCCGGCA 



1701 GGCGGTAGGG TATGTGTCTG AAAATGAGCT CGGAGATTGG GCTCGCACCT 
CCGCCATCCC ATACACAGAC TTTTACTCGA GCCTCTAACC CGAGCGTGGA 



1751 GGACGCAGAT GGAAGACTTA AGGCAGCGGC AGAAGAAGAT GCAGGCAGCT 
CCTGCGTCTA CCTTCTGAAT TCCGTCGCCG TCTTCTTCTA CGTCCGTCGA 



1801 GAGTTGTTGT ATTCTGATAA GAGTCAGAGG TAACTCCCGT TGCGGTGCTG 
CTCAACAACA TAAGACTATT CTCAGTCTCC ATTGAGGGCA ACGCCACGAC 
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1851 TTAACGGTGG AGGGCAGTGT AGTCTGAGCA GTACTCGTTG CTGCCGCGCG 
AATTGCCACC TCCCGTCACA TCAGACTCGT CATGAGCAAC GACGGCGCGC 



1901 CGCCACCAGA CATAATAGCT GACAGACTAA CAGACTGTTC CTTTCCATGG 
GCGGTGGTCT GTATTATCGA CTGTCTGATT GTCTGACAAG GAAAGGTACC 



+ 2 MAP 

EcoRI 

1951 GTCTTTTCTG CAGTCACCGT CGTCGACCTA AGAATTCACC ATGGCGCCCA 

CAGAAAAGAC GTCAGTGGCA GCAGCTGGAT TCTTAAGTGG TACCGCGGGT 



+ 2ITAY AQQ TRGL LGC I I T 
2001 TCACGGCGTA CGCCCAGCAG ACAAGGGGCC TCCTAGGGTG CATAATCACC 
AGTGCCGCAT GCGGGTCGTC TGTTCCCCGG AGGATCCCAC GTATTAGTGG 



+ 2SLTG RDK NQV EGEV Q I V 
2051 AGCCTAACTG GCCGGGACAA AAACCAAGTG GAGGGTGAGG TCCAGATTGT 
TCGGATTGAC CGGCCCTGTT TTTGGTTCAC CTCCCACTCC AGGTCTAACA 



+ 2 STA AQTF LAT CIN GVC 
2101 GTCAACTGCT GCCCAAACCT TCCTGGCAAC GTGCATCAAT GGGGTGTGCT 
CAGTTGACGA CGGGTTTGGA AGGACCGTTG CACGTAGTTA CCCCACACGA 



+ 2WTVY HGA GTRT IAS PKG 
2151 GGACTGTCTA CCACGGGGCC GGAACGAGGA CCATCGCGTC ACCCAAGGGT 
CCTGACAGAT GGTGCCCCGG CCTTGCTCCT GGTAGCGCAG TGGGTTCCCA 



-2PVIQ MYT N V D QDLV GWP 
22C1 CCTGTCATCC AGATGTATAC CAATGTAGAC CAAGACCTTG TGGGCTGGCC 
GGACAGTAGG TCTACATATG GTTACATCTG GTTCTGGAAC ACCCGACCGG 



+2 A S Q GTRS LTP CTC GSS 
2251 CGCTTCGCAA GGTACCCGCT CATTGACACC CTGCACTTGC GGCTCCTCGG 
GCGAAGCGTT CCATGGGCGA GTAACTGTGG GACGTGAACG CCGAGGAGCC 



+ 2DLYL VTR HADV IPV RRR 
2301 ACCTTTACCT GGTCACGAGG CACGCCGATG TCATTCCCGT GCGCCGGCGG 
TGGAAATGGA CCAGTGCTCC GTGCGGCTAC AGTAAGGGCA CGCGGCCGCC 



+2GDSR GSL LSP RPIS YLK 
2351 GGTGATAGCA GGGGCAGCCT GCTGTCGCCC CGGCCCATTT CCTACTTGAA 
CCACTATCGT CCCCGTCGGA CGACAGCGGG GCCGGGTAAA GGATGAACTT 



+ 2 GSS GGPL LCP AGH A V G 
2401 AGGCTCCTCG GGGGGTCCGC TGTTGTGCCC CGCGGGGCAC GCCGTGGGCA 
TCCGAGGAGC CCCCCAGGCG ACAACACGGG GCGCCCCGTG CGGCACCCGT 



+2IFRA AVC TRGV A K A V D F 
2 4 51 TATTTAGGGC CGCGGTGTGC ACCCGTGGAG TGGCTAAGGC GGTGGACTTT 
ATAAATCCCG GCGCCACACG TGGGCACCTC ACCGATTCCG CCACCTGAAA 



+2IPVE NLE TTM RSPV FTD 
2 501 ATCCCTGTGG AGAACCTAGA GACAACCATG AGGTCCCCGG TGTTCACGGA 
TAGGGACACC TCTTGGATCT CTGTTGGTAC TCCAGGGGCC ACAAGTGCCT 
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+2 NSS*PPVV PQS FQV A H L 
2551 TAACTCCTCT CCACCAGTAG TGCCCCAGAG CTTCCAGGTG GCTCACCTCC 
ATTGAGGAGA GGTGGTCATC ACGGGGTCTC GAAGGTCCAC CGAGTGGAGG 



+2HAPT GSG KSTK VPA A Y A 
2601 ATGCTCCCAC AGGCAGCGGC AAAAGCACCA AGGTCCCGGC TGCATATGCA 
TACGAGGGTG TCCGTCGCCG TTTTCGTGGT TCCAGGGCCG ACGTATACGT 



+2AQG-Y KVL VLN PSVA ATL- 
2651 GCTCAGGGCT ATAAGGTGCT AGTACTCAAC CCCTCTGTTG CTGCAACACT 
CGAGTCCCGA TATTCCACGA TCATGAGTTG GGGAGACAAC GACGTTGTGA 



+2 GFG AYMS KAH GID PNI 
27 01 GGGCTTTGGT GCTTACATGT CCAAGGCTCA TGGGATCGAT CCTAACATCA 
CCCGAAACCA CGAATGTACA GGTTCCGAGT ACCCTAGCTA GGATTGTAGT 



+ 2RTGV RTI TTGS PIT YST 
2751 GGACCGGGGT GAGAACAATT ACCACTGGCA GCCCCATCAC GTACTCCACC 
CCTGGCCCCA CTCTTGTTAA TGGTGACCGT CGGGGTAGTG CATGAGGTGG 



+2YGKF LAD GGC SGGA Y D I 
2801 TACGGCAAGT TCCTTGCCGA CGGCGGGTGC TCGGGGGGCG CTTATGACAT 
ATGCCGTTCA AGGAACGGCT GCCGCCCACG AGCCCCCCGC GAATACTGTA 



+2 IIC DECH STD ATS ILG 
2851 AATAATTTGT GACGAGTGCC ACTCCACGGA TGCCACATCC ATCTTGGGCA 
TTATTAAACA CTGCTCACGG TGAGGTGCCT ACGGTGTAGG TAGAACCCGT 



+ 2IGTV LDQ AETA GAR LVV 
2 901 TTGGCACTGT CCTTGACCAA GCAGAGACTG CGGGGGCGAG ACTGGTTGTG 
AACCGTGACA GGAACTGGTT CGTCTCTGAC GCCCCCGCTC TGACCAACAC 



+2 LATA TPP GSV TVPH PNI 
2951 CTCGCCACCG CCACCCCTCC GGGCTCCGTC ACTGTGCCCC ATCCCAACAT 
GAGCGGTGGC GGTGGGGAGG CCCGAGGCAG TGACACGGGG TAGGGTTGTA 



+2 EEV ALST TGE IPF YGK 
3001 CGAGGAGGTT GCTCTGTCCA CCACCGGAGA GATCCCTTTT TACGGCAAGG 
GCTCCTCCAA CGAGACAGGT GGTGGCCTCT CTAGGGAAAA ATGCCGTTCC 



+ 2AIPL EVI KGGR HLI FCH 
3051 CTATCCCCCT CGAAGTAATC AAGGGGGGGA GACATCTCAT CTTCTGTCAT 
GATAGGGGGA GCTTCATTAG TTCCCCCCCT CTGTAGAGTA GAAGACAGTA 



+ 2SKKK CDE L A A K L V A LGI 
3101 TCAAAGAAGA AGTGCGACGA ACTCGCCGCA AAGCTGGTCG CATTGGGCAT 
AGTTTCTTCT TCACGCTGCT TGAGCGGCGT TTCGACCAGC GTAACCCGTA 



+2 N A V AYYR GLD VSV IPT 
3151 CAATGCCGTG GCCTACTACC GCGGTCTTGA CGTGTCCGTC ATCCCGACCA 
GTTACGGCAC CGGATGATGG CGCCAGAACT GCACAGGCAG TAGGGCTGGT 



+2SGDV VVV ATDA LMT GYT 
3201 GCGGCGATGT TGTCGTCGTG GCAACCGATG CCCTCATGAC CGGCTATACC 
CGCCGCTACA ACAGCAGCAC CGTTGGCTAC GGGAGTACTG GCCGATATGG 
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+ 2GDFD S V I DCN TCVT QTV 
3251 GGCGACTTCG ACTCGGTGAT AGACTGCAAT ACGTGTGTCA CCCAGACAGT 
CCGCTGAAGC TGAGCCACTA TCTGACGTTA TGCACACAGT GGGTCTGTCA 



+2 DFS LDPT F T I ETI TLP 
3301 CGATTTCAGC CTTGACCCTA CCTTCACCAT TGAGACAATC ACGCTCCCCC 
GCTAAAGTCG GAACTGGGAT GGAAGTGGTA ACTCTGTTAG TGCGAGGGGG 



+ 2QDAV SRT Q R R G R T G RGK 
3351 AAGATGCTGT CTCCCGCACT CAACGTCGGG GCAGGACTGG CAGGGGGAAG 
TTCTACGACA GAGGGCGTGA GTTGCAGCCC CGTCCTGACC GTCCCCCTTC 



+ 2 P G I Y RFV APG ERPS GMF 
34 01 CCAGGCATCT ACAGATTTGT GGCACCGGGG GAGCGCCCCT CCGGCATGTT 
GGTCCGTAGA TGTCTAAACA CCGTGGCCCC CTCGCGGGGA GGCCGTACAA 



+2 DSS V L C E CYD AGC AWY 
3451 CGACTCGTCC GTCCTCTGTG AGTGCTATGA CGCAGGCTGT GCTTGGTATG 
GCTGAGCAGG CAGGAGACAC TCACGATACT GCGTCCGACA CGAACCATAC 



+ 2ELTP AET TVRL RAY MNT 
3501 AGCTCACGCC CGCCGAGACT ACAGTTAGGC TACGAGCGTA CATGAACACC 
TCGAGTGCGG GCGGCTCTGA TGTCAATCCG ATGCTCGCAT GTACTTGTGG 



+ 2PGLP VCQ DHL EFWE GVF 
3551 CCGGGGCTTC CCGTGTGCCA GGACCATCTT GAATTTTGGG AGGGCGTCTT 
GGCCCCGAAG GGCACACGGT CCTGGTAGAA CTTAAAACCC TCCCGCAGAA 



+ 2 TGL THID AHF LSQ TKQ 
StuI 



3601 TACAGGCCTC ACT CAT AT AG ATGCCCACTT TCTATCCCAG ACAAAGCAGA 
ATGTCCGGAG TGAGTATATC TACGGGTGAA AGATAGGGTC TGTTTCGTCT 



+2SGEN LPY LVAY QAT VCA 
3651 GTGGGGAGAA CCTTCCTTAC CTGGTAGCGT ACCAAGCCAC CGTGTGCGCT 
CACCCCTCTT GGAAGGAATG GACCATCGCA TGGTTCGGTG GCACACGCGA 



+ 2RAQA PPP SWD QMWK CLI 
3701 AGGGCTCAAG CCCCTCCCCC ATCGTGGGAC CAGATGTGGA AGTGTTTGAT 
TCCCGAGTTC GGGGAGGGGG TAGCACCCTG GTCTACACCT TCACAAACTA 



+2 RLK PTLH GPT PLL YRL 
3751 TCGCCTCAAG CCCACCCTCC ATGGGCCAAC ACCCCTGCTA TACAGACTGG 
AGCGGAGTTC GGGTGGGAGG TACCCGGTTG TGGGGACGAT ATGTCTGACC 



+2GAVQ N E I TLTH PVT KYI 
3801 GCGCTGTTCA GAATGAAATC ACCCTGACGC ACCCAGTCAC CAAATACATC 
CGCGACAAGT CTTACTTTAG TGGGACTGCG TGGGTCAGTG GTTTATGTAG 



+2MTCM SAD LEV VTST WVL 
3851 ATGACATGCA TGTCGGCCGA CCTGGAGGTC GTCACGAGCA CCTGGGTGCT 
TACTGTACGT ACAGCCGGCT GGACCTCCAG CAGTGCTCGT GGACCCACGA 



+ 2 VGG V L A A L A A YCL STG 
3901 CGTTGGCGGC GTCCTGGCTG CTTTGGCCGC GTATTGCCTG TCAACAGGCT 
GCAACCGCCG CAGGACCGAC GAAACCGGCG CATAACGGAC AGTTGTCCGA 
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+ 2 C V V I V G R VVLS GKP All 
3951 GCGTGGTCAT AGTGGGCAGG GTCGTCTTGT CCGGGAAGCC GGCAATCATA 
CGCACCAGTA TCACCCGTCC CAGCAGAACA GGCCCT7CGG CCGTTAGTAT 



+ 2PDRE VLY R E F DEME EC 
4001 CCTGACAGGG AAGTCCTCTA CCGAGAGTTC GATGAGATGG AAGAGTGCTA 
GGACTGTCCC TTCAGGAGAT GGCTCTCAAG CTACTCTACC TTCTCACGAT 



BamHI - Mlul 



4051 GGATCCACTA CGCGTTAGAG CTCGCTGATC AGCCTCGACT GTGCCTTCTA 
CCTAGGTGAT GCGCAATCTC GAGCGACTAG TCGGAGCTGA CACGGAAGAT 



4101 GTTGCCAGCC ATCTGTTGTT TGCCCCTCCC CCGTGCCTTC CTTGACCCTG 
CAACGGTCGG TAGACAACAA ACGGGGAGGG GGCACGGAAG GAACTGGGAC 



4151 GAAGGTGCCA CTCCCACTGT CCTTTCCTAA TAAAATGAGG AAATTGCATC 
CTTCCACGGT GAGGGTGACA GGAAAGGATT ATTTTACTCC TTTAACGTAG 



42 01 GCATTGTCTG AGTAGGTGTC ATTCTATTCT GGGGGGTGGG GTGGGGCAGG 
CGTAACAGAC TCATCCACAG TAAGATAAGA CCCCCCACCC CACCCCGTCC 



4251 ACAGCAAGGG GGAGGATTGG GAAGACAATA GCAGGCATGC TGGGGAGCTC 
TGTCGTTCCC CCTCCTAACC CTTCTGTTAT CGTCCGTACG ACCCCTCGAG 



4301 TTCCGCTTCC TCGCTCACTG ACTCGCTGCG CTCGGTCGTT CGGCTGCGGC 
AAGGCGAAGG AGCGAGTGAC TGAGCGACGC GAGCCAGCAA GCCGACGCCG 



4351 GAGCGGTATC AGCTCACTCA AAGGCGGTAA TACGGTTATC CACAGAATCA 
CTCGCCATAG TCGAGTGAGT TTCCGCCATT ATGCCAATAG GTGTCTTAGT 



4 401 GGGGATAACG CAGGAAAGAA CATGTGAGCA AAAGGCCAGC AAAAGGCCAG 
CCCCTATTGC GTCCTTTCTT GTACACTCGT TTTCCGGTCG TTTTCCGGTC 



44 51 GAACCGTAAA AAGGCCGCGT TGCTGGCGTT TTTCCATAGG CTCCGCCCCC 
CTTGGCATTT TTC CGGCGCA ACGACCGCAA AAAGGTATCC GAGGCGGGGG 



4501 CTGACGAGCA TCACAAAAAT CGACGCTCAA GTCAGAGGTG GCGAAACCCG 
GACTGCTCGT AGTGTTTTTA GCTGCGAGTT CAGTCTCCAC CGCTTTGGGC 



4551 ACAGGACTAT AAAGATACCA GGCGTTTCCC CCTGGAAGCT CCCTCGTGCG 
TGTCCTGATA TTTCTATGGT CCGCAAAGGG GGACCTTCGA GGGAGCACGC 



4 601 CTCTCCTGTT CCGACCCTGC CGCTTACCGG ATACCTGTCC GCCTTTCTCC 
GAGAGGACAA GGCTGGGACG GCGAATGGCC TATGGACAGG CGGAAAGAGG 



4 651 CTTCGGGAAG CGTGGCGCTT TCTCAATGCT CACGCTGTAG GTATCTCAGT 
GAAGCCCTTC GCACCGCGAA AGAGTTACGA GTGCGACATC CATAGAGTCA 



4701 TCGGTGTAGG TCGTTCGCTC CAAGCTGGGC TGTGTGCACG AACCCCCCGT 
AGCCACATCC AGCAAGCGAG GTTCGACCCG ACACACGTGC TTGGGGGGCA 



4751 TCAGCCCGAC CGCTGCGCCT TATCCGGTAA CTATCGTCTT GAGTCCAACC 
AGTCGGGCTG GCGACGCGGA ATAGGCCATT GATAGCAGAA CTCAGGTTGG 



4801 CGGTAAGACA CGACTTATCG CCACTGGCAG CAGCCACTGG TAACAGGATT 
GCCATTCTGT GCTGAATAGC GGTGACCGTC GTCGGTGACC ATTGTCCTAA 
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4851 AGCAGAGCGA GGTATGTAGG CGGTGCTACA GAGTTCTTGA AGTGGTGGCC 
TCGTCTCGCT CCATACATCC GCCACGATGT CTCAAGAACT TCACCACCGG 



4 901 TAACTACGGC TACACTAGAA GGACAGTATT TGGTATCTGC GCTCTGCTGA 
ATTGATGCCG ATGTGATCTT CCTGTCATAA ACCATAGACG CGAGACGACT 



4 951 AGCCAGTTAC CTTCGGAAAA AGAGTTGGTA GCTCTTGATC CGGCAAACAA 
TCGGTCAATG GAAGCCTTTT TCTCAACCAT CGAGAACTAG GCCGTTTGTT 



5001 ACCACCGCTG GTAGCGGTGG TTTTTTTGTT TGCAAGCAGC AGATTACGCG 
TGGTGGCGAC CATCGCCACC AAAAAAACAA ACGTTCGTCG TCTAATGCGC 



5051 CAGAAAAAAA GGATCTCAAG AAGATCCTTT GATCTTTTCT ACGGGGTCTG 
GTCTTTTTTT CCTAGAGTTC TTCTAGGAAA CTAGAAAAGA TGCCCCAGAC 



5101 ACGCTCAGTG GAACGAAAAC TCACGTTAAG GGATTTTGGT CATGAGATTA 
TGCGAGTCAC CTTGCTTTTG AGTGCAATTC CCTAAAACCA GTACTCTAAT 



5151 TCAAAAAGGA TCTTCACCTA GATCCTTTTA AATTAAAAAT GAAGTTTTAA 
AGTTTTTCCT AGAAGTGGAT CTAGGAAAAT TTAATTTTTA CTTCAAAATT 



5201 ATCAATCTAA AGTATATATG AGTAAACTTG GTCTGACAGT TACCAATGCT 
TAGTTAGATT T CAT AT AT AC TCATTTGAAC CAGACTGTCA ATGGTTACGA 



5251 TAATCAGTGA GGCACCTATC TCAGCGATCT GTCTATTTCG TTCATCCATA 
ATTAGTCACT CCGTGGATAG AGTCGCTAGA CAGATAAAGC AAGTAGGTAT 



5301 GTTGCCTGAC TCCCCGTCGT GTAGATAACT ACGATACGGG AGGGCTTACC 
CAACGGACTG AGGGGCAGCA CATCTATTGA TGCTATGCCC TCCCGAATGG 



5351 ATCTGGCCCC AGTGCTGCAA TGATACCGCG AGACCCACGC TCACCGGCTC 
TAGACCGGGG TCACGACGTT ACTATGGCGC TCTGGGTGCG AGTGGCCGAG 



54 01 CAGATTTATC AGCAATAAAC CAGCCAGCCG GAAGGGCCGA GCGCAGAAGT 
GTCTAAATAG TCGTTATTTG GTCGGTCGGC CTTCCCGGCT CGCGTCTTCA 



5451 GGTCCTGCAA CTTTATCCGC CTCCATCCAG TCTATTAATT GTTGCCGGGA 
CCAGGACGTT GAAATAGGCG GAGGTAGGTC AGATAATTAA CAACGGCCCT 



5501 AGCTAGAGTA AGTAGTTCGC CAGTTAATAG TTTGCGCAAC GTTGTTGCCA 
TCGATCTCAT TCATCAAGCG GTCAATTATC AAACGCGTTG CAACAACGGT 



5551 TTGCTACAGG CATCGTGGTG TCACGCTCGT CGTTTGGTAT GGCTTCATTC 
AACGATGTCC GTAGCACCAC AGTGCGAGCA GCAAACCATA CCGAAGTAAG 



5601 AGCTCCGGTT CCCAACGATC AAGGCGAGTT ACATGATCCC CCATGTTGTG 
TCGAGGCCAA GGGTTGCTAG TTCCGCTCAA TGTACTAGGG GGTACAACAC 



5651 CAAAAAAGCG GTTAGCTCCT TCGGTCCTCC GATCGTTGTC AGAAGTAAGT 
GTTTTTTCGC CAATCGAGGA AGCCAGGAGG CTAGCAACAG TCTTCATTCA 



5701 TGGCCGCAGT GTTATCACTC ATGGTTATGG CAGCACTGCA TAATTCTCTT 
ACCGGCGTCA CAATAGTGAG TACCAATACC GTCGTGACGT ATTAAGAGAA 



5751 ACTGTCATGC CATCCGTAAG ATGCTTTTCT GTGACTGGTG AGTACTCAAC 
TGACAGTACG GTAGGCATTC TACGAAAAGA CACTGACCAC TCATGAGTTG 
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5801 CAAGTCATTC TGAGAATAGT GTATGCGGCG ACCGAGTTGC TCTTGCCCGG 
GTTCAGTAAG ACTCTTATCA CATACGCCGC TGGCTCAACG AGAACGGGCC 



5851 CGTCAATACG GGATAATACC GCGCCACATA GCAGAACTTT AAAAGTGCTC 
GCAGTTATGC CCTATTATGG CGCGGTGTAT CGTCTTGAAA TTTTCACGAG 



5901 ATCATTGGAA AACGTTCTTC GGGGCGAAAA CTCTCAAGGA TCTTACCGCT 
TAGTAACCTT TTGCAAGAAG CCCCGCTTTT GAGAGTTCCT AGAATGGCGA 



5951 GTTGAGATCC AGTTCGATGT AACCCACTCG TGCACCCAAC TGATCTTCAG 
CAACTCTAGG TCAAGCTACA TTGGGTGAGC ACGTGGGTTG ACTAGAAGTC 



6001 CATCTTTTAC TTTCACCAGC GTTTCTGGGT GAGCAAAAAC AGGAAGGCAA 
GTAGAAAATG AAAGTGGTCG CAAAGACCCA CTCGTTTTTG TCCTTCCGTT 



6051 AATGCCGCAA AAAAGGGAAT AAGGGCGACA CGGAAATGTT GAATACTCAT 
TTACGGCGTT TTTTCCCTTA TTCCCGCTGT GCCTTTACAA CTTATGAGTA 



6101 ACTCTTCCTT TTTCAATATT ATTGAAGCAT TTATCAGGGT TATTGTCTCA 
TGAGAAGGAA AAAGTTATAA TAACTTCGTA AATAGTCCCA ATAACAGAGT 



6151 TGAGCGGATA CATATTTGAA TGTATTTAGA AAAATAAACA AATAGGGGTT 
ACTCGCCTAT GTATAAACTT ACATAAATCT TTTTATTTGT TTATCCCCAA 



6201 CCGCGCACAT TTCCCCGAAA AGTGCCACCT GACGTCTAAG AAACCATTAT 
GGCGCGTGTA AAGGGGCTTT TCACGGTGGA CTGCAGATTC TTTGGTAATA 



6251 TATCATGACA TTAACCTATA AAAATAGGCG TATCACGAGG CCCTTTCGTC 
ATAGTACTGT AATTGGATAT TTTTATCCGC ATAGTGCTCC GGGAAAGCAG 
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MetAlaAlaTyrAlaAlaGlnGlyTyrLysValLeuVal 
2 AGCTTACAAAACAAATTCACCATGGCTGCATATGCAGCTCAGGGCTATAAGGTGCTAGTA 
TCGAATGTTTTGTTTAAGTGGTACCGACGTATACGTCGAGTCCCGATATTCCACGATCAT 

1 HIND3, 21 NCOI, 30 NDEI, 58 SCAI, 

LeuAsnProSerValAlaAlaThrLeuGlyPheGlyAlaTyrMetSerLysAlaHisGly 
62 CTCAACCCCTCTGTTGCTGCAACACTGGGCTTTGGTGCTTACATGTCCAAGGCTCATGGG 
GAGTTGGGGAGACAACGACGTTGTGACCCGAAACCACGAATGTACAGGTTCCGAGTACCC 

IleAspProAsnlleArgThrGlyValArgThrlleThrThrGlySerProIleThrTyr 
122 ATCGATCCTAACATCAGGACCGGGGTGAGAACAATTACCACTGGCAGCCCCATCACGTAC 
TAGCTAGGATTGTAGTCCTGGCCCCACTCTTGTTAATGGTGACCGTCGGGGTAGTGCATG 

122 CLAI, 

SerThrTyrGlyLysPheLeuAlaAspGlyGlyCysSerGlyGlyAlaTyrAspIlelle 
182 TCCACCTACGGCAAGTTCCTTGCCGACGGCGGGTGCTCGGGGGGCGCTTATGACATAATA 
AGGTGGATGCCGTTCAAGGAACGGCTGCCGCCCACGAGCCCCCCGCGAATACTGTATTAT 

IleCysAspGluCysHisSerThrAspAlaThrSerlleLeuGlylleGlyThrValLeu 
242 ATTTGTGACGAGTGCCACTCCACGGATGCCACATCCATCTTGGGCATTGGCACTGTCCTT 
TAAACACTGCTCACGGTGAGGTGCCTACGGTGTAGGTAGAACCCGTAACCGTGACAGGAA 

AspGlnAlaGluThrAlaGlyAlaArgLeuValValLeuAlaThrAlaThrProProGly 
302 GACCAAGCAGAGACTGCGGGGGCGAGACTGGTTGTGCTCGCCACCGCCACCCCTCCGGGC 
CTGGTTCGTCTCTGACGCCCCCGCTCTGACCAACACGAGCGGTGGCGGTGGGGAGGCCCG 

309 ALWN1, 

SerValThrValProHisProAsnlleGluGluValAlaLeuSerThrThrGlyGluIle 
362 TCCGTCACTGTGCCCCATCCCAACATCGAGGAGGTTGCTCTGTCCACCACCGGAGAGATC 
AGGCAGTGACACGGGGTAGGGTTGTAGCTCCTCCAACGAGACAGGTGGTGGCCTCTCTAG 

ProPheTyrGlyLysAlalleProLeuGluVallleLysGlyGlyArgHisLeuIlePhe 
4 2 2 CCTTTTTACGGCAAGGCTATCCCCCTCGAAGTAATCAAGGGGGGGAGACATCTCATCTTC 
GGAAAAATGCCGTTCCGATAGGGGGAGCTTCATTAGTTCCCCCCCTCTGTAGAGTAGAAG 

CysHisSerLysLysLysCysAspGluLeuAlaAlaLysLeuValAlaLeuGlylleAsn 
4 8 2 TGTCATTCAAAGAAGAAGTGCGACGAACTCGCCGCAAAGCTGGTCGCATTGGGCATCAAT 
ACAGTAAGTTTCTTCTTCACGCTGCTTGAGCGGCGTTTCGACCAGCGTAACCCGTAGTTA 

AlaValAlaTyrTyrArgGlyLeuAspValSerVallleProThrSerGlyAspValVal 
54 2 GCCGTGGCCTACTACCGCGGTCTTGACGTGTCCGTCATCCCGACCAGCGGCGATGTTGTC 
CGGCACCGGATGATGGCGCCAGAACTGCACAGGCAGTAGGGCTGGTCGCCGCTACAACAG 

A A 

556 SAC2 , 566 DRD1, 

ValValAlaThrAspAlaLeuMetThrGlyTyrThrGlyAspPheAspSerVallleAsp 
602 GTCGTGGCAACCGATGCCCTCATGACCGGCTATACCGGCGACTTCGACTCGGTGATAGAC 
CAGCACCGTTGGCTACGGGAGTACTGGCCGATATGGCCGCTGAAGCTGAGCCACTATCTG 

621 BSPH1, 

CysAsnThrCysValThrGlnThrValAspPheSerLeuAspProThrPheThrlleGlu 
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662 TGCAATACGTGTGTCACCCAGACAGTCGATTTCAGCCTTGACCCTACCTTCACCATTGAG 
ACGTTATGCACACAGTGGGTCTGTCAGCTAAAGTCGGAACTGGGATGGAAGTGGTAACTC 

'ThrlleThrLeuProGlnAspAlaValSerArgThrGlnArgArgGlyArgThrGlyArg 
722 ACAATCACGCTCCCCCAAGATGCTGTCTCCCGCACTCAACGTCGGGGCAGGACTGGCAGG 
TGTTACSTGCGAGGGGGTTCTACGACAGAGGGCGTGAGTTGCAGCCCCGTCCTGACCGTCC 

GlyLysProGlylleTyrArgPheValAlaProGlyGluArgProSerGlyMetPheAsp 
782 GGGAAGCCAGGCATCTACAGATTTGTGGCACCGGGGGAGCGCCCCTCCGGCATGTTCGAC 
CCCTTCGGTCCGTAGATGTCTAAACACCGTGGCCCCCTCGCGGGGAGGCCGTACAAGCTG 

822 BGLI, 839 DRD1 , 

SerSerValLeuCysGluCysTyrAspAlaGlyCysAlaTrpTyrGluLeuThrProAla 
84 2 TCGTCCGTCCTCTGTGAGTGCTATGACGCAGGCTGTGCTTGGTATGAGCTCACGCCCGCC 
AGCAGGCAGGAGACACTCACGATACTGCGTCCGACACGAACCATACTCGAGTGCGGGCGG 

/\ 

887 SACI, 

GluThrThrValArgLeuArgAlaTyrMetAsnThrProGlyLeuProValCysGlnAsp 
902 GAGACTACAGTTAGGCTACGAGCGTACATGAACACCCCGGGGCTTCCCGTGTGCCAGGAC 
CTCTGATGTCAATCCGATGCTCGCATGTACTTGTGGGGCCCCGAAGGGCACACGGTCCTG 



937 SMAI XMAI, 

HisLeuGluPheTrpGluGlyValPheThrGlyLeuThrHisIleAspAlaHisPheLeu 
962 CATCTTGAATTTTGGGAGGGCGTCTTTACAGGCCTCACTCATATAGATGCCCACTTTCTA 
GTAGAACTTAAAACCCTCCCGCAGAAATGTCCGGAGTGAGTATATCTACGGGTGAAAGAT 



991 STUI, 

SerGlnThrLysGlnSerGlyGluAsnLeuProTyrLeuValAlaTyrGlnAlaThrVal 
1022 TCCCAGACAAAGCAGAGTGGGGAGAACCTTCCTTACCTGGTAGCGTACCAAGCCACCGTG 
AGGGTCTGTTTCGTCTCACCCCTCTTGGAAGGAATGGACCATCGCATGGTTCGGTGGCAC 

A 

1075 DRA3, 

CysAlaArgAlaGlnAlaProProProSerTrpAspGlnMetTrpLysCysLeuIleArg 
1082 TGCGCTAGGGCTCAAGCCCCTCCCCCATCGTGGGACCAGATGTGGAAGTGTTTGATTCGC 
ACGCGATCCCGAGTTCGGGGAGGGGGTAGCACCCTGGTCTACACCTTCACAAACTAAGCG 

LeuLysProThrLeuHisGlyProThrProLeuLeuTyrArgLeuGlyAlaValGlnAsn 
1142 CTCAAGCCCACCCTCCATGGGCCAACACCCCTGCTATACAGACTGGGCGCTGTTCAGAAT 
GAGTTCGGGTGGGAGGTACCCGGTTGTGGGGACGATATGTCTGACCCGCGACAAGTCTTA 



1156 NCOI, 

GluIleThrLeuThrHisProValThrLysTyrlleMetThrCysMetSerAlaAspLeu 
1202 GAAATCACCCTGACGCACCCAGTCACCAAATACATCATGACATGGATGTCGGCCGACCTG 
CTTTAGTGGGACTGCGTGGGTCAGTGGTTTATGTAGTACTGTACGTACAGCCGGCTGGAC 

A a a a a 

1236 BSPH1, 1240 DRD1, 1243 AVA3, 1251 EAG1 XMA3, 1256 DRD1, 



GluValValThrSerThrTrpValLeuValGlyGlyValLeuAlaAlaLeuAlaAlaTyr 
1262 GAGGTCGTCACGAGCACCTGGGTGCTCGTTGGCGGCGTCCTGGCTGCTTTGGCCGCGTAT 
CTCCAGCAGTGCTCGTGGACCCACGAGCAACCGCCGCAGGACCGACGAAACCGGCGCATA 
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CysLeuSerThrGlyCysValVallleValGlyArgValValLeuSerGlyLysProAla 
1322 TGCCTGTCAACAGGCTGCGTGGTCATAGTGGGCAGGGTCGTCTTGTCCGGGAAGCCGGCA 
ACGGACAGTTGTCCGACGCACCAGTATCACCCGTCCCAGCAGAACAGGCCCTTCGGCCGT 

* A 

1375 NAEI, 

IlelleProAspArgGluValLeuTyrArgGluPheAspGluMetGluGluCysSerGln 
1382 ATCATACCTGACAGGGAAGTCCTCTACCGAGAGTTCGATGAGATGGAAGAGTGCTCTCAG 
TAGTATGGACTGTCCCTTCAGGAGATGGCTCTCAAGCTACTCTACCTTCTCACGAGAGTC 

A 

1391 DRD1, 

HisLeuProTyrlleGluGlnGlyMetMetLeuAlaGluGlnPheLysGlnLysAlaLeu 
1442 CACTTACCGTACATCGAGCAAGGGATGATGCTCGCCGAGCAGTTCAAGCAGAAGGCCCTC 
GTGAATGGCATGTAGCTCGTTCCCTACTACGAGCGGCTCGTCAAGTTCGTCTTCCGGGAG 

GlyLeuLeuGlnThrAlaSerArgGlnAlaGluVallleAlaProAlaValGlnThrAsn 
1502 GGCCTCCTGCAGACCGCGTCCCGTCAGGCAGAGGTTATCGCCCCTGCTGTCCAGACCAAC 
CCGGAGGACGTCTGGCGCAGGGCAGTCCGTCTCCAATAGCGGGGACGACAGGTCTGGTTG 

/\ A 

1508 PSTI, 1513 TTH3I, 

TrpGlnLysLeuGluThrPheTrpAlaLysHisMetTrpAsnPhelleSerGlylleGln 
15 62 TGGCAAAAACTCGAGACCTTCTGGGCGAAGCATATGTGGAACTTCATCAGTGGGATACAA 
ACCGTTTTTGAGCTCTGGAAGACCCGCTTCGTATACACCTTGAAGTAGTCACCCTATGTT 

A A 

1571 XHOI, 1592 NDEI, 

TyrLeuAlaGlyLeuSerThrLeuProGlyAsnProAlalleAlaSerLeuMetAlaPhe 
1622 TACTTGGCGGGCTTGTCAACGCTGCCTGGTAACCCCGCCATTGCTTCATTGATGGCTTTT 
ATGAACCGCCCGAACAGTTGCGACGGACCATTGGGGCGGTAACGAAGTAACTACCGAAAA 

A 

1649 BSTE2, 

ThrAlaAlaValThrSerProLeuThrThrSerGlnThrLeuLeuPheAsnlleLeuGly 
1682 ACAGCTGCTGTCACCAGCCCACTAACCACTAGCCAAACCCTCCTCTTCAACATATTGGGG 
TGTCGACGACAGTGGTCGGGTGATTGGTGATCGGTTTGGGAGGAGAAGTTGTATAACCCC 

A 

1683 ALWN1 PVU2, 

GlyTrpValAlaAlaGlnLeuAlaAlaProGlyAlaAlaThrAlaPheValGlyAlaGly 
17 42 GGGTGGGTGGCTGCCCAGCTCGCCGCCCCCGGTGCCGCTACTGCCTTTGTGGGCGCTGGC 
CCCACCCACCGACGGGTCGAGCGGCGGGGGCCACGGCGATGACGGAAACACCCGCGACCG 

A 

1800 ESP1, 

LeuAlaGlyAlaAlalleGlySerValGlyLeuGlyLysValLeuIleAspIleLeuAla 
1802 TTAGCTGGCGCCGCCATCGGCAGTGTTGGACTGGGGAAGGTCCTCATAGACATCCTTGCA 
AATCGACCGCGGCGGTAGCCGTCACAACCTGACCCCTTCCAGGAGTATCTGTAGGAACGT 

A 

1808 KAS1 NAM, 

GlyTyrGlyAlaGlyValAlaGlyAlaLeuValAlaPheLysIleMetSerGlyGluVal 
1862 GGGTATGGCGCGGGCGTGGCGGGAGCTCTTGTGGCATTCAAGATCATGAGCGGTGAGGTC 
CCCATACCGCGCCCGCACCGCCCTCGAGAACACCGTAAGTTCTAGTACTCGCCACTCCAG 
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1884 SACI, 1905 BSPH1, 

ProSerThrGluAspLeuValAsnLeuLeuProAlalleLeuSerProGlyAlaLeuVal 
1922 CCCTCCACGGAGGACCTGGTCAATCTACTGCCCGCCATCCTCTCGCCCGGAGCCCTCGTA 
GGGAGGTGCCTCCTGGACCAGTTAGATGACGGGCGGTAGGAGAGCGGGCCTCGGGAGCAT 

1934 TTH3I, 

ValGlyValValCysAlaAlalleLeuArgArgHisValGlyProGlyGluGlyAlaVal 
1982 GTCGGCGTGGTCTGTGCAGCAATACTGCGCCGGCACGTTGGCCCGGGCGAGGGGGCAGTG 
CAGCCGCACCAGACACGTCGTTATGACGCGGCCGTGCAACCGGGCCCGCTCCCCCGTCAC 

A A 

2010 NAEI f 2023 SMAI XMAI, 

GlnTrpMetAsnArgLeuIleAlaPheAlaSerArgGlyAsnHisValSerProThrHis 
2042 CAGTGGATGAACCGGCTGATAGCCTTCGCCTCCCGGGGGAACCATGTTTCCCCCACGCAC 
GTCACCTACTTGGCCGACTATCGGAAGCGGAGGGCCCCCTTGGTACAAAGGGGGTGCGTG 

A A 

207 3 SMAI XMAI, 2099 DRA3, 

TyrValProGluSerAspAlaAlaAlaArgValThrAlalleLeuSerSerLeuThrVal 
2102 TACGTGCCGGAGAGCGATGCAGCTGCCCGCGTCACTGCCATACTCAGCAGCCTCACTGTA 
ATGCACGGCCTCTCGCTACGTCGACGGGCGCAGTGACGGTATGAGTCGTCGGAGTGACAT 

A 

2121 PVU2, 

ThrGlnLeuLeuArgArgLeuHisGlnTrpIleSerSerGluCysThrThrProCysSer 
2162 ACCCAGCTCCTGAGGCGACTGCACCAGTGGATAAGCTCGGAGTGTACCACTCCATGCTCC 
TGGGTCGAGGACTCCGCTGACGTGGTCACCTATTCGAGCCTCACATGGTGAGGTACGAGG 

A A 

2165 ALWN1, 2170 MST2, 

GlySerTrpLeuArgAspIleTrpAspTrpIleCysGluValLeuSerAspPheLysThr 
2222 GGTTCCTGGCTAAGGGACATCTGGGACTGGATATGCGAGGTGTTGAGCGACTTTAAGACC 
CCAAGGACCGATTCCCTGTAGACCCTGACCTATACGCTCCACAACTCGCTGAAATTCTGG 

A 

2226 ECON1, 

TrpLeuLysAlaLysLeuMetProGlnLeuProGlylleProPheValSerCysGlnArg 
2282 TGGCTAAAAGCTAAGCTCATGCCACAGCTGCCTGGGATCCCCTTTGTGTCCTGCCAGCGC 
ACCGATTTTCGATTCGAGTACGGTGTCGACGGACCCTAGGGGAAACACAGGACGGTCGCG 

A A A 

2291 ESP1, 2306 PVU2, 2316 BAMHI, 

GlyTyrLysGlyValTrpArgGlyAspGlylleMetHisThrArgCysHisCysGlyAla 
234 2 GGGTATAAGGGGGTCTGGCGAGGGGACGGCATCATGCACACTCGCTGCCACTGTGGAGCT 
CCCATATTCCCCCAGACCGCTCCCCTGCCGTAGTACGTGTGAGCGACGGTGACACCTCGA 

GluIleThrGlyHisValLysAsnGlyThrMetArglleValGlyProArgThrCysArg 
2 4 02 GAGATCACTGGACATGTCAAAAACGGGACGATGAGGATCGTCGGTCCTAGGACCTGCAGG 
CTCTAGTGACCTGTACAGTTTTTGCCCTGCTACTCCTAGCAGCCAGGATCCTGGACGTCC 

/\ AAA 

2431 BSAB1 , 2447 AVR2, 2454 SSE83871, 2455 PSTI, 

AsnMetTrpSerGlyThrPheProIleAsnAlaTyrThrThrGlyProCysThrProLeu 
24 62 AACATGTGGAGTGGGACCTTCCCCATTAATGCCTACACCACGGGCCCCTGTACCCCCCTT 
TTGTACACCTCACCCTGGAAGGGGTAATTACGGATGTGGTGCCCGGGGACATGGGGGGAA 
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2486 ASE1, 2503 APAI, ' 

ProAlaProAsnTyrThrPheAlaLeuTrpArgValSerMaGluGluTyrValGluIle 
2522 CCTGCGCCGAACTACACGTTCGCGCTATGGAGGGTGTCTGCAGAGGAATACGTGGAGATA 
GGACGCGGCTTGATGTGCAAGCGCGATACCTCCCACAGACGTCTCCTTATGCACCTCTAT 

2559 PSTI, 

ArgGlnValGlyAspPheHisTyrValThrGlyMetThrThrAspAsnLeuLysCysPro 
2532 AGGCAGGTGGGGGACTTCCACTACGTGACGGGTATGACTACTGACAATCTT AAAT GCCCG 
TCCGTCCACCCCCTGAAGGTGATGCACTGCCCATACTGATGACTGTTAGAATTTACGGGC 

A 

2600 "DRA3, 

CysGlnValProSerProGluPhePheThrGluLeuAspGlyValArgLeuHisArgPhe 
2642 TGCCAGGTCCCATCGCCCGAATTTTTCACAGAATTGGACGGGGTGCGCCTACATAGGTTT 
ACGGTCCAGGGTAGCGGGCTTAAAAAGTGTCTTAACCTGCCCCACGCGGATGTATCCAAA 

AlaProProCysLysProLeuLeuArgGluGluValSerPheArgValGlyLeuHisGlu 
2702 GCGCCCCCCTGCAAGCCCTTGCTGCGGGAGGAGGTATCATTCAGAGTAGGACTCCACGAA 
CGCGGGGGGACGTTCGGGAACGACGCCCTCCTCCATAGTAAGTCTCATCCTGAGGTGCTT 

TyrProValGlySerGlnLeuProCysGluProGluProAspValAlaValLeuThrSer 
27 62 TACCCGGTAGGGTCGCAATTACCTTGCGAGCCCGAACCGGACGTGGCCGTGTTGACGTCC 
ATGGGCCATCCCAGCGTTAATGGAACGCTCGGGCTTGGCCTGCACCGGCACAACTGCAGG 

^ A 

2763 HGIE2 , 2815 AAT2, 

MetLeuThrAspProSerHisIleThrAlaGluAlaAlaGlyArgArgLeuAlaArgGly 
2822 ATGCTCACTGATCCCTCCCATATAACAGCAGAGGCGGCCGGGCGAAGGTTGGCGAGGGGA 
TACGAGTGACTAGGGAGGGTATATTGTCGTCTCCGCCGGCCCGCTTCCAACCGCTCCCCT 

A 

2856 EAG1 XMA3, 

SerProProSerValAlaSerSerSerAlaSerGlnLeuSerAlaProSerLeuLysAla 
2882 TCACCCCCCTCTGTGGCCAGCTCCTCGGCTAGCCAGCTATCCGCTCCATCTCTCAAGGCA 
AGTGGGGGGAGACACCGGTCGAGGAGCCGATCGGTCGATAGGCGAGGTAGAGAGTTCCGT 

A A 

2895 BALI, 2909 NHEI, 

ThrCysThrAlaAsnHisAspSerProAspAlaGluLeuIleGluAlaAsnLeuLeuTrp 
294 2 ACTTGCACCGCTAACCATGACTCCCCTGATGCTGAGCTCATAGAGGCCAACCTCCTATGG 
TGAACGTGGCGATTGGTACTGAGGGGACTACGACTCGAGTATCTCCGGTTGGAGGATACC 

A A 

2972 ESP1, 2975 SACI, 

ArgGlnGIuMetGlyGlyAsnlleThrArgValGluSerGluAsnLysValVallleLeu 
3002 AGGCAGGAGATGGGCGGCAACATCACCAGGGTTGAGTCAGAAAACAAAGTGGTGATTCTG 
TCCGTCCTCTACCCGCCGTTGTAGTGGTCCCAACTCAGTCTTTTGTTTCACCACTAAGAC 

AspSerPheAspProLeuValAlaGIuGluAspGluArgGluIleSerValProAlaGlu 
3062 GACTCCTTCGATCCGCTTGTGGCGGAGGAGGACGAGCGGGAGATCTCCGTACCCGCAGAA 
CTGAGGAAGCTAGGCGAACACCGCCTCCTCCTGCTCGCCCTCTAGAGGCATGGGCGTCTT 

3102 BGL2, 
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IleLeuArgLysSerArgArgPheAlaGlnAlaLeuProValTrpAlaArgProAspTyr 
3122 ATCCTGCGGAAGTCTCGGAGATTCGCCCAGGCCCTGCCCGTTTGGGCGCGGCCGGACTAT 
TAGGACGCCTTCAGAGCCTCTAAGCGGGTCCGGGACGGGCAAACCCGCGCCGGCCTGATA 

A A 

3149 ALWNl, 3170 EAG1 XMA3, 

AsnProProLeuValGluThrTrpLysLysProAspTyrGluProProValValHisGly 
3182 AACCCCCCGCTAGTGGAGACGTGGAAAAAGCCCGACT ACGAACCACCTGTGGTCCATGGC 
TTGGGGGGCGATCACCTCTGCACCTTTTTCGGGCTGATGCTTGGTGGACACCAGGTACCG 

A /\ 

3223 HGIE2 , 3235 NCOI, 

CysProLeuProProProLysSerProProValProProProArgLysLysArgThrVal 
3242 TGCCCGCTTCCACCTCCAAAGTCCCCTCCTGTGCCTCCGCCTCGGAAGAAGCGGACGGTG 
ACGGGCGAAGGTGGAGGTTTCAGGGGAGGACACGGAGGCGGAGCCTTCTTCGCCTGCCAC 

ValLeuThrGluSerThrLeuSerThrAlaLeuAlaGluLeuAlaThrArgSerPheGly 
3302 GTCCTCACTGAATCAACCCTATCTACTGCCTTGGCCGAGCTCGCCACCAGAAGCTTTGGC 
CAGGAGTGACTTAGTTGGGATAGATGACGGAACCGGCTCGAGCGGTGGTCTTCGAAACCG 

A A 

3338 SACI, 3352 HIND3, 

SerSerSerThrSerGlylleThrGlyAspAsnThrThrThrSerSerGluProAlaPro 
3362 AGCTCCTCAACTTCCGGCATTACGGGCGACAATACGACAACATCCTCTGAGCCCGCCCCT 
TCGAGGAGTTGAAGGCCGTAATGCCCGCTGTTATGCTGTTGTAGGAGACTCGGGCGGGGA 

SerGlyCysProProAspSerAspAlaGluSerTyrSerSerMetProProLeuGluGly 
3422 TCTGGCTGCCCCCCCGACTCCGACGCTGAGTCCTATTCCTCCATGCCCCCCCTGGAGGGG 
AGACCGACGGGGGGGCTGAGGCTGCGACTCAGGATAAGGAGGTACGGGGGGGACCTCCCC 

A 

3443 EAM11051, 

GluProGlyAspProAspLeuSerAspGlySerTrpSerThrValSerSerGluAlaAsn 
3482 GAGCCTGGGGATCCGGATCTTAGCGACGGGTCATGGTCAACGGTCAGTAGTGAGGCCAAC 
CTCGGACCCCTAGGCCTAGAATCGCTGCCCAGTACCAGTTGCCAGTCATCACTCCGGTTG 

A A A 

3490 BAMRI, 3491 BSAB1, 3493 BSPE1, 

AlaGluAspValValCysCysSerMetSerTyrSerTrpThrGlyAlaLeuValThrPro 
3542 GCGGAGGATGTCGTGTGCTGCTCAATGTCTTACTCTTGGACAGGCGCACTCGTCACCCCG 
CGCCTCCTACAGCACACGACGAGTTACAGAATGAGAACCTGTCCGCGTGAGCAGTGGGGC 

A 

3595 DRA3, 

CysAlaAlaGluGluGlnLysLeuProIleAsnAlaLeuSerAsnSerLeuLeuArgHis 
3602 TGCGCCGCGGAAGAACAGAAACTGCCCATCAATGCACTAAGCAACTCGTTGCTACGTCAC 
ACGCGGCGCCTTCTTGTCTTTGACGGGTAGTTACGTGATTCGTTGAGCAACGATGCAGTG 

A A A. 

3606 SAC2, 3617 ALWN1, 3661 PFLM1, 

HisAsnLeuValTyrSerThrThrSerArgSerAlaCysGlnArgGlnLysLysValThr 
3662 CACAATTTGGTGTATTCCACCACCTCACGCAGTGCTTGCCAAAGGCAGAAGAAAGTCACA 
GTGTTAAACCACATAAGGTGGTGGAGTGCGTCACGAACGGTTTCCGTCTTCTTTCAGTGT 

3687 DRA3, 

PheAspArgLeuGlnValLeuAspSerHisTyrGlnAspValLeuLysGluValLysAla 
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3722 TTTGACAGACTGCAAGTTCTGGACAGCCATTACCAGGACGTACTCAAGGAGGTTAAAGCA 
AAACTGTCTGACGTTCAAGACCTGTCGGTAATGGTCCTGCATGAGTTCCTCCAATTTCGT 

1 AlaAiaSerLysValLysAIaAsnLeuLeuSerValGluGIuAIaCysSerLeuThrPro 
3782 3CGGCGTCAAAAGTGAAGGCTAACTTGCTATCCGT AGAGGAAGCTTGCAGCCTGACGCCC 
CGCCGCAGTTTTCACTTCCGATTGAACGATAGGCATCTCCTTCGAACGTCGGACTGCGGG 

A 

3822 HIND3, 

ProHisSerAlaLysSerLysPheGlyTyrGlyAlaLysAspValArgCysHisAlaArg 
38 4 2 CCACACTCAGCCAAATCCAAGTTTGGTTATGGGGCAAAAGACGTCCGTTGCCATGCCAGA 
GGTGTGAGTCGGTTTAGGTTCAAACCAATACCCCGTTTTCTGCAGGCAACGGTACGGTCT 

3881 . AAT2 , 3896 BGLI , 

LysAIaValThrHisIleAsnSerValTrpLysAspLeuLeuGluAspAsnValThrPro 
3902 AAGGCCGTAACCCACATCAACTCCGTGTGGAAAGACCTTCTGGAAGACAATGTAACACCA 
TTCCGGCATTGGGTGTAGTTGAGGCACACCTTTCTGGAAGACCTTCTGTTACATTGTGGT 

IleAspThrThrlleMetAlaLysAsnGluValPheCysValGlnProGluLysGlyGly 
3962 ATAGACACTACCATCATGGCTAAGAACGAGGTTTTCTGCGTTCAGCCTGAGAAGGGGGGT 
TATCTGTGATGGTAGTACCGATTCTTGCTCCAAAAGACGCAAGTCGGACTCTTCCCCCCA 

ArgLysProAlaArgLeuIleValPheProAspLeuGlyValArgVaiCysGluLysMet 
4022 CGTAAGCCAGCTCGTCTCATCGTGTTCCCCGATCTGGGCGTGCGCGTGTGCGAAAAGATG 
GCATTCGGTCGAGCAGAGTAGCACAAGGGGCTAGACCCGCACGCGCACACGCTTTTCTAC 

AlaLeuTyrAspValValThrLysLeuProLeuAlaValMetGlySerSerTyrGlyPhe 
4082 GCTTTGTACGACGTGGTTACAAAGCTCCCCTTGGCCGTGATGGGAAGCTCCTACGGATTC 
CGAAACATGCTGCACCAATGTTTCGAGGGGAACCGGCACTACCCTTCGAGGATGCCTAAG 

GlnTyrSerProGlyGlnArgValGluPheLeuValGlnAlaTrpLysSerLysLysThr 
4142 CAATACTCACCAGGACAGCGGGTTGAATTCCTCGTGCAAGCGTGGAAGTCCAAGAAAACC 
GTTATGAGTGGTCCTGTCGCCCAACTTAAGGAGCACGTTCGCACCTTCAGGTTCTTTTGG 

4166 ECORI, 

ProMetGlyPheSerTyrAspThrArgCysPheAspSerThrValThrGluSerAspIle 
4202 CCAATGGGGTTCTCGTATGATACCCGCTGCTTTGACTCCACAGTCACTGAGAGCGACATC 
GGTTACCCCAAGAGCATACTATGGGCGACGAAACTGAGGTGTCAGTGACTCTCGCTGTAG 

4235 DRD1, 4242 ALWN1, 

ArgThrGluGluAlalleTyrGlnCysCysAspLeuAspProGlnAlaArgValAlalle 
4262 CGTACGGAGGAGGCAATCTACCAATGTTGTGACCTCGACCCCCAAGCCCGCGTGGCCATC 
GCATGCCTCCTCCGTTAGATGGTTACAACACTGGAGCTGGGGGTTCGGGCGCACCGGTAG 

A A 

4307 BGLI, 4314 BALI, 

LysSerLeuThrGluArgLeuTyrValGlyGlyProLeuThrAsnSerArgGlyGluAsn 
4322 AAGTCCCTCACCGAGAGGCTTTATGTTGGGGGCCCTCTTACCAATTCAAGGGGGGAGAAC 
TTCAGGGAGTGGCTCTCCGAAATACAACCCCCGGGAGAATGGTTAAGTTCCCCCCTCTTG 

A 

4 351 APAI, 

CysGlyTyrArgArgCysArgAIaSerGlyValLeuThrThrSerCysGlyAsnThrLeu 
4 382 TGCGGCTATCGCAGGTGCCGCGCGAGCGGCGTACTGACAACTAGCTGTGGTAACACCCTC 
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ACGCCGATAGCGTCCACGGCGCGCTCGCCGCATGACTGTTGATCGACACCATTGTGGGAG 



ThrCysTyrlleLysAlaArgAlaAlaCysArgAlaAlaGlyLeuGlnAspCysThrMet 
4442 ACTTGCTACATCAAGGCCCGGGCAGCCTGTCGAGCCGCAGGGCTCCAGGACTGCACCATG 
TGAACGATGTAGTTCCGGGCCCGTCGGACAGCTCGGCGTCCCGAGGTCCTGACGTGGTAC 

A 

4458 SMAI XMAI, 

LeuValCysGIyAspAspLeuValVallleCysGluSerAlaGlyValGlnGluAspAla 
4502 CTCGTGTGTGGCGACGACTTAGTCGTTATCTGTGAAAGCGCGGGGGTCCAGGAGGACGCG 
GAGCACACACCGCTGCTGAATCAGCAATAGACACTTTCGCGCCCCCAGGTCCTCCTGCGC 

A A 

4514 DRD1, 4517 TTH3I, 

AlaSerLeuArgAlaPheThrGluAlaMetThrArgTyrSerAlaProProGlyAspPro 
4 562 GCGAGCCTGAGAGCCTTCACGGAGGCTATGACCAGGTACTCCGCCCCCCCTGGGGACCCC 
CGCTCGGACTCTCGGAAGTGCCTCCGATACTGGTCCATGAGGCGGGGGGGACCCCTGGGG 

ProGlnProGluTyrAspLeuGluLeuIleThrSerCysSerSerAsnValSerValAla 
4 622 CCACAACCAGAATACGACTTGGAGCTCATAACATCATGCTCCTCCAACGTGTCAGTCGCC 
GGTGTTGGTCTTATGCTGAACCTCGAGTATTGTAGTACGAGGAGGTTGCACAGTCAGCGG 

A 

4643 SACI, 

HisAspGlyAlaGlyLysArgValTyrTyrLeuThrArgAspProThrThrProLeuAla 
4 682 CACGACGGCGCTGGAAAGAGGGTCTACTACCTCACCCGTGACCCTACAACCCCCCTCGCG 
GTGCTGCCGCGACCTTTCTCCCAGATGATGGAGTGGGCACTGGGATGTTGGGGGGAGCGC 

A 

4737 NRUI, 

ArgAlaAlaTrpGluThrAlaArgHisThrProValAsnSerTrpLeuGlyAsnllelle 
4742 AGAGCTGCGTGGGAGACAGCAAGACACACTCCAGTCAATTCCTGGCTAGGCAACATAATC 
TCTCGACGCACCCTCTGTCGTTCTGTGTGAGGTCAGTTAA6GACCGATCCGTTGTATTAG 

MetPheAlaProThrLeuTrpAlaArgMetlleLeuMetThrHisPhePheSerValLeu 
4 802 ATGTTTGCCCCCACACTGTGGGCGAGGATGATACTGATGACCCATTTCTTTAGCGTCCTT 
TACAAACGGGGGTGTGACACCCGCTCCTACTATGACTACTGGGTAAAGAAATCGCAGGAA 

A A 

4812 PFLM1, 4813 DRA3, 

IleAlaArgAspGlnLeuGluGlnAlaLeuAspCysGluIleTyrGlyAlaCysTyrSer 
4862 ATAGCCAGGGACCAGCTTGAACAGGCCCTCGATTGCGAGATCTACGGGGCCTGCTACTCC 
TATCGGTCCCTGGTCGAACTTGTCCGGGAGCTAACGCTCTAGATGCCCCGGACGATGAGG 

A 

4899 BGL2, 

IleGluProLeuAspLeuProProIlelleGlnArgLeuHisGlyLeuSerAlaPheSer 
4 922 ATAGAACCACTGGATCTACCTCCAATCATTCAAAGACTCCATGGCCTCAGCGCATTTTCA 
TATCTTGGTGACCTAGATGGAGGTTAGTAAGTTTCTGAGGTACCGGAGTCGCGTAAAAGT 

A 

4960 NCOI, 

LeuHisSerTyrSerProGlyGluIleAsnArgValAlaAlaCysLeuArgLysLeuGly 
4 982 CTCCACAGTTACTCTCCAGGTGAAATCAATAGGGTGGCCGCATGCCTCAGAAAACTTGGG 
GAGGTGTCAATGAGAGGTCCACTTTAGTTATCCCACCGGCGTACGGAGTCTTTTGAACCC 



5021 SPHI, 5041 KPNI , 
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ValProProLeuArgAlaTrpArgHisArgAlaArgSerValArgAlaArgLeuLeuAla 
5042 GTACCGCCCTTGCGAGCTTGGAGACACCGGGCCCGGAGCGTCCGCGCTAGGCTTCTGGCC 
CATGGCGGGAACGCTCGAACCTCTGTGGCCCGGGCCTCGCAGGCGCGATCCGAAGACCGG 

A A 

5070 APAI, 5097 BALI/ 

ArgGlyGlyArgAlaAlalleCysGlyLysTyrLeuPheAsnTrpAlaValArgThrLys 
5102 AGAGGAGGCAGGGCTGCCATATGTGGCAAGTACCTCTTCAACTGGGCAGTAAGAACAAAG 
TCTCCTCCGTCCCGACGGTATACACCGTTCATGGAGAAGTTGACCCGTCATTCTTGTTTC 

A 

5119 NDEI, 

LeuLysLeuThrProIleAlaAlaAlaGlyGlnLeuAspLeuSerGlyTrpPheThrAla 
5162 CTCAAACTCACTCCAATAGCGGCCGCTGGCCAGCTGGACTTGTCCGGCTGGTTCACGGCT 
GAGTTTGAGTGAGGTTATCGCCGGCGACCGGTCGACCTGAACAGGCCGACCAAGTGCCGA 

A A A A 

5180 NOTI, 5181 EAG1 XMA3, 5188 BALI , 5192 PVU2, 

GlyTyrSerGlyGlyAspIleTyrHisSerValSerHisAlaArgProArgTrpIleTrp 
5222 GGCTACAGCGGGGGAGACATTTATCACAGCGTGTCTCATGCCCGGCCCCGCTGGATCTGG 
CCGATGTCGCCCCCTCTGTAAATAGTGTCGCACAGAGTACGGGCCGGGGCGACCTAGACC 

A 

524 6 DRA3, 

PheCysLeuLeuLeuLeuAlaAlaGlyValGlylleTyrLeuLeuProAsnArgOP 
5282 TTTTGCCTACTCCTGCTTGCTGCAGGGGTAGGCATCTACCTCCTCCCCAACCGATGAAGG 
AAAACGGATGAGGACGAACGACGTCCCCATCCGTAGATGGAGGAGGGGTTGGCTACTTCC 

A A 

5301 PSTI, 5331 HGIE2, 



5342 TTGGGGTAAACACTCCGGCCTAAAAAAAAAAAAAAATCTAGAACCCGAGTCGAC 
AACCCCATTTGTGAGGCCGGATTTTTTTTTTTTTTTAGATCTTGGGCTCAGCTG 

A A 

5378 XBAI, 5390 SALI, 
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MetAlaAlaTyrAlaAlaGlnGlyTyrLysValLeuVaiLeuAsn 
2 AGCTTACAAAACAAAATGGCTGCATATGCAGCTCAGGGCTATAAGGTGCTAGTACTCAAC 
, TCGAATGTTTTGTTTTACCGACGTATACGTCGAGTCCCGATATTCCACGATCATGAGTTG 

A A /V 

1 HIND3, 24 NDEI , 52 SCAI, 

ProSerValAlaAlaThrLeuGlyPheGlyAlaTyrMetSerLysAlaHisGIylieAsp 
62 CCCTCTGTTGCTGCAACACTGGGCTTTGGTGCTTACATGTCCAAGGCTCATGGGA7CGAT 
GGGAGACAACGACGTTGTGACCCGAAACCACGAATGTACAGGTTCCGAGTACCCTAGCTA 

A 

116 CLAI , 

PrdAsnlleArgThrGlyValArgThrlleThrThrGlySerProIleThrTyrSerThr 
12 2 CCTAACATCAGGACCGGGGTGAGAACAATTACCACTGGCAGCCCCATCACGTACTCCACC 
GGATTGTAGTCCTGGCCCCACTCTTGTTAATGGTGACCGTCGGGGTAGTGCATGAGGTGG 

TyrGlyLysPheLeuAlaAspGlyGlyCysSerGlyGlyAlaTyrAspIlelielleCys 
182 TACGGCAAGTTCCTTGCCGACGGCGGGTGCTCGGGGGGCGCTTATGACATAATAATTTGT 
ATGCCGTTCAAGGAACGGCTGCCGCCCACGAGCCCCCCGCGAATACTGTATTATTAAACA 

AspGluCysHisSerThrAspAlaThrSerlleLeuGlylleGlyThrValLeuAspGln 
2 4 2 GACGAGTGCCACTCCACGGATGCCACATCCATCTTGGGCATTGGCACTGTCCTTGACCAA 
CTGCTCACGGTGAGGTGCCTACGGTGTAGGTAGAACCCGTAACCGTGACAGGAACTGGTT 

AlaGluThrAlaGlyAlaArgLeuValValLeuAlaThrAlaThrProProGlySerVal 
302 GCAGAGACTGCGGGGGCGAGACTGGTTGTGCTCGCCACCGCCACCCCTCCGGGCTCCGTC 
CGTCTCTGACGCCCCCGCTCTGACCAACACGAGCGGTGGCGGTGGGGAGGCCCGAGGCAG 

303 ALWN1, 

ThrValProHisProAsnlleGluGluValAlaLeuSerThrThrGlyGluIleProPhe 
362 ACTGTGCCCCATCCCAACATCGAGGAGGTTGCTCTGTCCACCACCGGAGAGATCCCTTTT 
TGACACGGGGTAGGGTTGTAGCTCCTCCAACGAGACAGGTGGTGGCCTCTCTAGGGAAAA 

TyrGlyLysAlalleProLeuGluVallleLysGlyGlyArgHisLeuIlePheCysHis 
4 22 TACGGCAAGGCTATCCCCCTCGAAGTAATCAAGGGGGGGAGACATCTCATCTTCTGTCAT 
ATGCCGTTCCGATAGGGGGAGCTTCATTAGTTCCCCCCCTCTGTAGAGTAGAAGACAGTA 

SerLysLysLysCysAspGluLeuAlaAlaLysLeuValAlaLeuGlylleAsnAlaVal 

4 82 TCAAAGAAGAAGTGCGACGAACTCGCCGCAAAGCTGGTCGCATTGGGCATCAATGCCGTG 

AGTTTCTTCTTCACGCTGCTTGAGCGGCGTTTCGACCAGCGTAACCCGTAGTTACGGCAC 

AlaTyrTyrArgGlyLeuAspValSerVallleProThrSerGlyAspValValValVal 

5 4 2 GCCTACTACCGCGGTCTTGACGTGTCCGTCATCCCGACCAGCGGCGATGTTGTCGTCGTG 

CGGATGATGGCGCCAGAACTGCACAGGCAGTAGGGCTGGTCGCCGCTACAACAGCAGCAC 

A A 

550 SAC 2 , 560 DRD1, 

AlaThrAspAlaLeuMetThrGlyTyrThrGlyAspPheAspSerVallleAspCysAsn 
602 GCAACCGATGCCCTCATGACCGGCTATACCGGCGACTTCGACTCGGTGATAGACTGCAAT 
CGTTGGCTACGGGAGTACTGGCCGATATGGCCGCTGAAGCTGAGCCACTATCTGACGTTA 

A 

615 BSPH1, 



ThrCysValThrGlnThrValAspPheSerLeuAspProThrPheThrlleGluThrlle 
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662 ACG7G7G7CACCCAGACAG7CGAT77CAGCC77GACCC7ACC77CACCA7fGAGACAA?C 

7GCACACAG7GGG7C7G7CAGCTAAAG7CGGAAC7GGGA7GGAAG7GG7AAC7CTGTTAG 

* 

ThrLeuProGlnAspAlaValSerArgThrGlnArgArgGIyArgThrGlyArgGlyLys 
722 ACGC7CCCCCAAGA7GC7G7C7CCCGCAC7CAACG7CGGGGCAGGAC7GGCAGGGGGAAG 
7GCGAGGGGGTTC7ACGACAGAGGGCG7GAG77GCAGCCCCG7CC7GACCG7CCCCC77C 

ProGlyIle7yrArgPheValAiaProGlyGluArgProSerGlyMetPheAspSerSer 
782 CCAGGCA7C7ACAGA777G7GGCACCGGGGGAGCGCCCC7CCGGCA7G77CGACTCG7CC 
GG7CCG7AGA7G7C7AAACACCG7GGCCCCC7CGCGGGGAGGCCG7ACAAGC7GAGCAGG 

316 BGLI, 833 DRD1, 

ValLeuCysGluCys7yrAspAlaGlyCysAiaTrp7yrGluLeuThrProAlaGlu7hr 
842 GTCC7CTG7GAG7GC7ATGACGCAGGC7G7GC77GG7A7GAGC7CACGCCCGCCGAGACT 
CAGGAGACAC7CACGA7AC7GCG7CCGACACGAACCA7AC7CGAGTGCGGGCGGCTCTGA 

881 SAC I, 

7hrValArgLeuArgAla7yrMetAsnThrProGlyLeuProValCysGlnAspHisLeu 
902 ACAG77AGGC7ACGAGCG7ACA7GAACACCCCGGGGC77CCCG7G7GCCAGGACCA7C77 
7G7CAA7CCGA7GC7CGCA7G7AC77G7GGGGCCCCGAAGGGCACACGG7CC7GG7AGAA 

931 SMAI XMAI, 

GluPhe7rpGluGlyValPhe7hrGlyLeuThrHisIleAspAlaHisPheLeuSerGln 
962 GAA7777GGGAGGGCG7C777ACAGGCC7CAC7CA7A7AGA7GCCCAC777C7A7CCCAG 
CTTAAAACCC7CCCGCAGAAATG7CCGGAG7GAGTATATC7ACGGG7GAAAGATAGGG7C 

985 S7UI, 

ThrLysGlnSerGlyGluAsnLeuProTyrLeuValAlaTyrGlnAlaThrValCysAla 
1022 ACAAAGCAGAG7GGGGAGAACC77CC77ACC7GG7AGCG7ACCAAGCCACCG7GTGCGCT 
TGT77CG7C7CACCCC7C7TGGAAGGAATGGACCATCGCATGG77CGGTGGCACACGCGA 

1069 DRA3, 

ArgAlaGlnAlaProProProSerTrpAspGlnMetTrpLysCysLeuIleArgLeuLys 
1082 AGGGC7CAAGCCCC7CCCCCA7CG7GGGACCAGA7G7GGAAG7G777GA77CGCC7CAAG 
7CCCGAG7TCGGGGAGGGGG7AGCACCC7GG7C7ACACC77CACAAAC7AAGCGGAG77C 

Pro7hrLeuHisGlyProThrProLeuLeuTyrArgLeuGlyAlaValGlnAsnGluIle 
1142 CCCACCC7CCA7GGGCCAACACCCC7GC7A7ACAGAC7GGGCGC7G77CAGAA7GAAATC 
GGG7GGGAGG7ACCCGG77G7GGGGACGA7A7G7C7GACCCGCGACAAG7C7TAC777AG 

1150 NCOI, 

7hrLeu7hrHisProVal7hrLysTyrIleMet7hrCysMetSerAlaAsDLeuGluVal 
1202 ACCCTGACGCACCCAG7CACCAAA7ACA7CA7GACA7GCA7G7CGGCCGACC7GGAGG7C 
7GGGAC7GCG7GGG7CAG7GG777ATG7AG7AC7G7ACGTACAGCCGGC7GGACC7CCAG 

1230 3SPH1 1 1234 DRD1 , 1237 AVA3, 1245 EAG1 XMA3, 1250 DRD1 , 



Val7hrSer7hr7rpValLeuValGlyGlyValLeuAlaAlaLeuAlaAla7yrCysLeu 
1262 GTCACGAGCACC7GGG7GC7CG77GGCGGCG7CC7GGC7GC777GGCCGCG7A77GCC7G 
CAG7GC7CG7GGACCCACGAGCAACCGCCGCAGGACCGACGAAACCGGCGCA7AACGGAC 
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SerThrGlyCysValVallleValGlyArgValValLeuSerGlyLysProAIallelle 
1322 TCAACAGGCTGCGTGGTCATAGTGGGCAGGGTCGTCTTGTCCGGGAAGCCGGCAATCATA 
AGTTGTCCGACGCACCAGTATCACCCGTCCCAGCAGAACAGGCCCTTCGGCCGT7AG7A7 

A 

1369 NAEI , 

ProAspArgGluValLeuTyrArgGluPheAspGluMetGluGluCysSerGlnKisLeu 
1382 CCTGACAGGGAAGTCCTCTACCGAGAGTTCGATGAGATGGAAGAGTGCTCTCAGCAC7TA 
GGACTGTCCCTTCAGGAGATGGCTCTCAAGCTACTCTACCTTCTCACGAGAGTCGTGAAT 

1385 DRD1 , 

ProTyrlleGluGlnGlyMetMetLeuAlaGluGlnPheLysGlnLysAlaLeuGIyLeu 

14 4 2 CCGTACATCGAGCAAGGGATGATGCTCGCCGAGCAGTTCAAGCAGAAGGCCCTCGGCCTC 

GGCATGTAGCTCGTTCCCTACTACGAGCGGCTCGTCAAGTTCGTCTTCCGGGAGCCGGAG 

LeuGlnThrAlaSerArgGlnAlaGluVallleAlaProAlaValGlnThrAsnTrpGln 

15 02 CTGCAGACCGCGTCCCGTCAGGCAGAGGTTATCGCCCCTGCTGTCCAGACCAACTGGCAA 

GACGTCTGGCGCAGGGCAGTCCGTCTCCAATAGCGGGGACGACAGGTCTGGTTGACCGTT 

1502 PSTI, 1507 TTH3I , 

LysLeuGluThrPheTrpAlaLysHisMetTrpAsnPhelleSerGlylleGlnTyrLeu 
1562 AAACTCGAGACCTTCTGGGCGAAGCATATGTGGAACTTCATCAGTGGGATACAATACTTG 
TTTGAGCTCTGGAAGACCCGCTTCGTATACACCTTGAAGTAGTCACCCTATGTTATGAAC 

1565 XHOI, 1586 NDEI , 

AlaGlyLeuSerThrLeuProGlyAsnProAlalleAlaSerLeuMetAlaPheThrAla 
1622 GCGGGCTTGTCAACGCTGCCTGGTAACCCCGCCATTGCTTCATTGATGGCTTTTACAGCT 
CGCCCGAACAGTTGCGACGGACCATTGGGGCGGTAACGAAGTAACTACCGAAAATGTCGA 

164 3 BSTE2 , 167 7 ALWN1 PVU2 , 

AlaValThrSerProLeuThrThrSerGlnThrLeuLeuPheAsnlleLeuGlyGiyTrp 
1682 GCTGTCACCAGCCCACTAACCACTAGCCAAACCCTCCTCTTCAACATATTGGGGGGGTGG 
CGACAGTGGTCGGGTGATTGGTGATCGGTTTGGGAGGAGAAGTTGTATAACCCCCCCACC 

ValAlaAlaGlnLeuAlaAlaProGlyAlaAlaThrAlaPheValGlyAlaGlyLeuAla 

17 42 GTGGCTGCCCAGCTCGCCGCCCCCGGTGCCGCTACTGCCTTTGTGGGCGCTGGCTTAGCT 

CACCGACGGGTCGAGCGGCGGGGGCCACGGCGATGACGGAAACACCCGCGACCGAATCGA 

1794 ESP1, 

GlyAlaAlalleGlySerValGlyLeuGlyLysValLeuIleAspIleLeuAlaGlyTyr 

18 02 GGCGCCGCCATCGGCAGTGTTGGACTGGGGAAGGTCCTCATAGACATCCTTGCAGGGTAT 

CCGCGGCGGTAGCCGTCACAACCTGACCCCTTCCAGGAGTATCTGTAGGAACG TOCCATA 

1802 KAS1 NARI , 

GlyAlaGlyValAlaGlyAlaLeuValAlaPheLysIleMetSerGlyGluValProSer 
1862 GGCGCGGGCGTGGCGGGAGCTCTTGTGGCATTCAAGATCATGAGCGGTGAGGTCCCCTCC 
CCGCGCCCGCACCGCCCTCGAGAACACCGTAAGTTCTAGTACTCGCCACTCCAGGGGAGG 



1878 SACI, 1899 BSPH1, 



FIGURE 14 - Page 4 



ThrGluAspLeuValAsnLeuLeuProAlalleLeuSerProGlyAlaLeuVaiValGly 
^522 ACGGAGGACCTGGTCAATCTACTGCCCGCCATCCTCTCGCCCGGAGCCCTCSTAGTCGGC 
TGCCTCCTGGACCAGTTAGATGACGGGCGGTAGGAGAGCGGGCCTCGGGAGCATCAGCCG 

A 

. 1928 TTH3I, 

ValValCysAlaAlalleLeuArgArgHisValGlyProGlyGluGlyAlaValGInTrp 
1982 GTGGTCTGTGCAGCAATACTGCGCCGGCACGTTGGCCCGGGCGAGGGGGCAGTGCAGTGG 
CACCAGACACGTCGTTATGACGCGGCCGTGCAACCGGGCCCGCTCCCCCGTCACGTCACC 

A A 

2004 NAEI, 2017 SMAI XMAI , 

MetAsnArgLeuIleAlaPheAlaSerArgGlyAsnHisValSerProThrHisTyrVal 
20 4 2 ATGAACCGGCTGATAGCCTTCGCCTCCCGGGGGAACCATGTTTCCCCCACGCACTACGTG 
TACTTGGCCGACTATCGGAAGCGGAGGGCCCCCTTGGTACAAAGGGGGTGCGTGATGCAC 

A A 

2067 SMAI XMAI , 2093 DRA3, 

ProGluSerAsDAlaAlaAlaArgValThrAlalleLeuSerSerLeuTftrValThrGln 
2102 CCGGAGAGCGATGCAGCTGCCCGCGTCACTGCCATACTCAGCAGCCTCACTGTAACCCAG 
GGCCTCTCGCTACGTCGACGGGCGCAGTGACGGTATGAGTCGTCGGAGTGACATTGGGTC 

2115 PVU2 , 2159 ALWN1, 

LeuLeuArgArgLeuHisGlnTrpIleSerSerGluCysThrThrProCysSerGlySer 
2162 CTCCTGAGGCGACTGCACCAGTGGATAAGCTCGGAGTGTACCACTCCATGCTCCGGTTCC 
GAGGACTCCGCTGACGTGGTCACCTATTCGAGCCTCACATGGTGAGGTACGAGGCCAAGG 

A 

2164 MST2, 2220 ECON1, 

TrDLeuAraAspIleTrpAspTrpIleCysGluVaiLeuSerAspPheLysThrTrpLeu 
2222 TGGCTAAGGGACATCTGGGACTGGATATGCGAGGTGTTGAGCGACTTTAAGACCTGGCTA 
ACCGATTCCCTGTAGACCCTGACCTATACGCTCCACAACTCGCTGAAATTCTGGACCGAT 

LysAlaLysLeuMetProGlnLeuProGlylleProPheValSerCysGlnArgGIyTyr 
2 2 S 2 AAAGCTAAGCTCATGCCACAGCTGCCTGGGATCCCCTTTGTGTCCTGCCAGCGCGGGTAT 
TTTCGATTCGAGTACGGTGTCGACGGACCCTAGGGGAAACACAGGACGGTCGCGCCCATA 

A A /V 

2285 ESPi, 2300 PVU2, 2310 BAMHI, 

LysGlyValTrpArgGlyAspGlylleMetHisThrArgCysHisCysGlyAlaGluIle 
234 2 AAGGGGGTCTGGCGAGGGGACGGCATCATGCACACTCGCTGCCACTGTGGAGCTGAGATC 
TTCCCCCAGACCGCTCCCCTGCCGTAGTACGTGTGAGCGACGGTGACACCTCGACTCTAG 

ThrGlvHisValLysAsnGlyThrMetArglleValGlyProArgThrCysArgAsnMet 
2 4 C2 ACTGGACATGTCAAAAACGGGACGATGAGGATCGTCGGTCCTAGGACCTGCAGGAACATG 
TGACCTGTACAGTTTTTGCCCTGCTACTCCTAGCAGCCAGGATCCTGGACGTCCTTGTAC 

yv A A A 

2425 BSAB1 , 2441 AVR2, 2448 SSE83871, 2449 PSTI, 

TrDSerGlyThrPheProIleAsnAlaTyrThrThrGlyProCysThrProLeuProAla 
24 62 TGGAGTGGGACCTTCCCCATTAATGCCTACACCACGGGCCCCTGTACCCCCCTTCCTGCG 
ACCTCACCCTGGAAGGGGTAATTACGGATGTGGTGCCCGGGGACATGGGGGGAAGGACGC 

A ^ 

2480 ASE1, 2497 APAI, 

ProAsnTyrThrPheAlaLeuTrpArgValSerAlaGluGluTyrValGluIleArgGln 
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2522 CCGAACTACACGTTCGCGCTATGGAGGGTGTCTGCAGAGGAATACGTGGAGATAAG3CA3 
GGCTTGATGTGCAAGCGCGATACCTCCCACAGACGTCTCCTTATGCACCTCTATTCCGTC 

2553 PSTI, 

ValGlyAspPheHisTyrValThrGlyMetThrThrAspAsnLeuLysCysProCysGln 
2582 GTGGGGGACTTCCACTACGTGACGGGTATGACTACTGACAATCTTAAATGCCCGTGCCAG 
CACCCCCTGAAGGTGATGCACTGCCCATACTGATGACTGTTAGAATTTACGGGCACGGTC 

2594 DRA3, 

ValProSerProGluPhePheThrGluLeuAspGlyValArgLeuHisArgPheAIaPrc 
2 64 2 GTCCCATCGCCCGAATTTTTCACAGAATTGGACGGGGTGCGCCTACATAGGTTTGCGCCC 
CAGGGTAGCGGGCTTAAAAAGTGTCTTAACCTGCCCCACGCGGATGTATCCAAACGCGGG 

ProCysLysProLeuLeuArgGluGluValSerPheArgValGlyLeuHisGluTyrPro 
27 02 CCCTGCAAGCCCTTGCTGCGGGAGGAGGTATCATTCAGAGTAGGACTCCACGAATACCCG 
GGGACGTTCGGGAACGACGCCCTCCTCCATAGTAAGTCTCATCCTGAGGTGCTTATGGGC 

2757 HGIE2 , 

ValGlySerGlnLeuProCysGluProGluProAspValAlaValLeuThrSerMetLeu 
2 7 62 GTAGGGTCGCAATTACCTTGCGAGCCCGAACCGGACGTGGCCGTGTTGACGTCCATGCTC 
CATCCCAGCGTTAATGGAACGCTCGGGCTTGGCCTGCACCGGCACAACTGCAGGTACGAG 

2809 AAT2 , 

ThrAspProSerHisIleThrAlaGluAlaAlaGlyArgArgLeuAlaArgGlySerPro 
2 9 22 ACTGATCCCTCCCATATAACAGCAGAGGCGGCCGGGCGAAGGT T GGCGAGGGGATCACCC 
TGACTAGGGAGGGTATATTGTCGTCTCCGCCGGCCCGCTTCCAACCGCTCCCCTAGTGGG 

28 50 EAG1 XMA3, 

ProSerValAlaSerSerSerAIaSerGlnLeuSerAlaProSerLeuLysAlaThrCys 
2S32 CCCTCTGTGGCCAGCTCCTCGGCTAGCCAGCTATCCGCTCCATCTCTCAAGGCAACTTGC 
GGGAGACACCGGTCGAGGAGCCGATCGGTCGATAGGCGAGGTAGAGAGTTCCGTTGAACG 

2889 BALI, 2903 NHEI, 

ThrAlaAsnHisAspSerProAspAlaGluLeuIleGluAlaAsnLeuLeuTrpArgGln 
2 94 2 ACCGCTAACCATGACTCCCCTGATGCTGAGCTCATAGAGGCCAACCTCCTATGGAGGCAG 
TGGCGATTGGTACTGAGGGGACTACGACTCGAGTATCTCCGGTTGGAGGATACCTCCGTC 

2966 ESP1, 2969 SACI, 

GluMetGlyGlyAsnlleThrArgValGluSerGluAsnLysValVallleLeuAspSer 
3002 GAGATGGGCGGCAACATCACCAGGGTTGAGTCAGAAAACAAAGTGGTGATTCTGGACTCC 
CTCTACCCGCCGTTGTAGTGGTCCCAACTCAGTCTTTTGTTTCACCACTAAGACCTGAGG 

PheAspProLeuValAlaGluGluAspGluArgGluIleSerValProAlaGluIleLeu 
3062 TTCGATCCGCTTGTGGCGGAGGAGGACGAGCGGGAGATCTCCGTACCCGCAGAAATCCTG 
AAGCTAGGCGAACACCGCCTCCTCCTGCTCGCCCTCTAGAGGCATGGGCGTCTTTAGGAC 

At 

3096 3GL2, 

ArgLysSerArgArgPheAlaGlnAlaLeuProValTrpAlaArgProAspTyrAsnPro 
3122 CGGAAGTCTCGGAGATTCGCCCAGGCCCTGCCCGTTTGGGCGCGGCCGGACTATAACCCC 
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GCCTTCAGAGCCTCTAAGCGGGTCCGGGACGGGCAAACCCGCGCCGGCCTG.-.TATTG3GG 

3143 ALWN1 , 3164 EAG1 XMA3, 

, ProLeuValGluThrTrpLysLysProAspTyrGluProProValValHxsGlyCysPro 
3182 CCGCTAGTGGAGACGTGGAAAAAGCCCGACTACGAACCACCTGTGGTCCATGGCTGCCCG 
GGCGATCACCTCTGCACCTTTTTCGGGCTGATGCTTGGTGGACACCAGGTACCGACGGGC 

3217 HGIE2 , 3229 NCOI, 

LeuProProProLysSerProProValProProProArgLysLysArgThrValValLeu 
3242 CTTCCACCTCCAAAGTCCCCTCCTGTGCCTCCGCCTCGGAAGAAGCGGACGGTGGTCCTC 
GAAGGTGGAGGTTTCAGGGGAGGACACGGAGGCGGAGCCTTCTTCGCCTGCCACCAGGAG 

ThrGluSerThrLeuSerThrAlaLeuAlaGluLeuAlaThrArgSerPheGlySerSer 
3302 ACTGAATCAACCCTATCTACTGCCTTGGCCGAGCTCGCCACCAGAAGCTTTGGCAGCTCC 
TGACTTAGTTGGGATAGATGACGGAACCGGCTCGAGCGGTGGTCTTCGAAACCGTCGAGG 

A A 

3332 SACI, 3346 HIND3 , 

SerThrSerGlylleThrGlyAspAsnThrThrThrSerSerGluProAlaProSerGly 
3362 TCAACTTCCGGCATTACGGGCGACAATACGACAACATCCTCTGAGCCCGCCCCTTCTGGC 
AGTTGAAGGCCGTAATGCCCGCTGTTATGCTGTTGTAGGAGACTCGGGCGGGGAAGACCG 

CysProProAspSerAspAlaGluSerTyrSerSerMetProProLeuGluGlyGluPro 
34 22 TGCCCCCCCGACTCCGACGCTGAGTCCTATTCCTCCATGCCCCCCCTGGAGGGGGAGCCT 
ACGGGGGGGCTGAGGCTGCGACTCAGGATAAGGAGGTACGGGGGGGACCTCCCCCTCGGA 

3437 EAM11051, 

GlyAsoProAspLeuSerAspGlySerTrpSerThrValSerSerGluAlaAsnAlaGlu 
34 S2 GGGGATCCGGATCTTAGCGACGGGTCATGGTCAACGGTCAGTAGTGAGGCCAACGCGGAG 
CCCCTAGGCCTAGAATCGCTGCCCAGTACCAGTTGCCAGTCATCACTCCGGTTGCGCCTC 

3484 BAMHI, 3485 BSAB1 , 3487 BSPE1, 

AspValValCysCysSerMetSerTyrSerTrpThrGlyAlaLeuValThrProCysAla 
354 2 GATGTCGTGTGCTGCTCAATGTCTTACTCTTGGACAGGCGCACTCGTCACCCCGTGCGCC 
CTACAGCACACGACGAGTTACAGAATGAGAACCTGTCCGCGTGAGCAGTGGGGCACGCGG 

A A 

3589 DRA3, 3600 SAC 2, 

AlaGluGluGlnLysLeuProIleAsnAlaLeuSerAsnSerLeuLeuArgHisHisAsn 
36C2 GCGGAAGAACAGAAACTGCCCATCAATGCACTAAGCAACTCGTTGCTACGTCACCACAAT 
CGCCTTCTTGTCTTTGACGGGTAGTTACGTGATTCGTTGAGCAACGATGCAGTGGTGTTA 

A A 

3611 ALWN 1 , 3655 PFLM1, 

LeuValTyrSerThrThrSerArgSerAlaCysGlnArgGlnLysLysValThrPheAsp 
3 662 TTGGTGTATTCCACCACCTCACGCAGTGCTTGCCAAAGGCAGAAGAAAGTCACATTTGAC 
AACCACATAAGGTGGTGGAGTGCGTCACGAACGGTTTCCGTCTTCTTTCAGTGTAAACTG 

A 

3681 DRA3, 

ArgLeuGInValLeuAspSerHisTyrGlnAspValLeuLysGluValLysAlaAlaAla 
3722 AGACTGCAAGTTCTGGACAGCCATTACCAGGACGTACTCAAGGAGGTTAAAGCAGCGGCG 
TCTGACGTTCAAGACCTGTCGGTAATGGTCCTGCATGAGTTCCTCCAATTTCGTCGCCGC 
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S»-LysValLysAlaAsnLeuLeuSerValGluGluAlaCysSerLeuThrProProHis 
37 82 TCAAAAGTGAAGGCTAACTTGCTATCCGTAGAGGAAGCTTGCAGCCTGACGCCCCCACAC 
AGTTTTCACTTCCGATTGAACGATAGGCATCTCCTTCGAACGTCGGACTGCGGGGGTGTG 

A 

381-6 HIND3, 

SerAlaLysSerLysPheGlyTyrGlyAlaLysAspValArgCysHisAlaArgLysAla 
33 4 2 TCAGCCAAATCCAAGTTTGGTTATGGGGCAAAAGACGTCCGTTGCCATGCCAGAAAGGCC 
AGTCGGTT7AGGTTCAAACCAATACCCCGTTTTCTGCAGGCAACGGTACGGTCTTTCCGG 

A * 

3875 AAT2 , 3890 BGLI, 

Vai^hrHisIleAsnSerValTrpLysAspLeuLeuGluAspAsn-ValThrProIleAsp 
3902 GTAACCCACATCAACTCCGTGTGGAAAGACCTTCTGGAAGACAATGTAACACCAATAGAC 
CATTGGGTGTAGTTGAGGCACACCTTTCTGGAAGACCTTCTGTTACATTGTGGTTATCTG 

ThrThr-TieMetAlaLysAsnGluValPheCysValGlnProGluLysGlyGiyArgLys 

3 962 ACTACCATCATGGCTAAGAACGAGGTTTTCTGCGTTCAGCCTGAGAAGGGGGGTCGTAAG 

TGATGGTAGTACCGATTCTTGCTCCAAAAGACGCAAGTCGGACTCTTCCCCCCAGCATTC 

D-oAlaArgLeuIleValPheProAspLeuGlyValArgValCysGluLysMetAlaLeu 

4 022 CCAGCTCGTCTCATCGTGTTCCCCGATCTGGGCGTGCGCGTGTGCGAAAAGATGGCTTTG 

GGTCGAGCAGAGTAGCACAAGGGGCTAGACCCGCACGCGCACACGCTTTTCTACCGAAAC 

^ A ^p ValVa iThrLysLeuProLeuAlaValMetGlySerSerTyrGlyPheGlriTyr 
4 082 TACGACGTGGTTACAAAGCTCCCCTTGGCCGTGATGGGAAGCTCCTACGGATTCCAATAC 
ATGCTGCACCAATGTTTCGAGGGGAACCGGCACTACCCTTCGAGGATGCCTAAGGTTATG 

Se-^-oGlyGlnArgValGluPheLeuValGlnAlaTrpLysSerLysLysThrProMet 
4 - - 2 -CACCAGGACAGCGGGTTGAATTCCTCGTGCAAGCGTGGAAGTCCAAGAAAACCCCAATG 
AGTGGTCCTGTCGCCCAACTTAAGGAGCACGTTCGCACCTTCAGGTTCTTTTGGGGTTAC 
/\ 

4160 ECORI, 

C-i y PheSerTyrAspThrArgCysPheAspSerThrValThrGluSerAspIieArgThr 
c 2 02 GGGTTCTCGTATGATACCCGCTGCTTTGACTCCACAGTCACTGAGAGCGACATCCGTACG 
CCCAAGAGCATACTATGGGCGACGAAACTGAGGTGTCAGTGACTCTCGCTGTAGGCATGC 

4229 DRD1 , 4236 ALWN1, 

GTuGluAlalleTyrGlnCysCysAspLeuAspProGlnAlaArgValAlalleLysSer 
^ 4 262 GAGGAGGCAATCTACCAATGTTGTGACCTCGACCCCCAAGCCCGCGTGGCCATCAAGTCC 
CTCCTCCGTTAGATGGTTACAACACTGGAGCTGGGGGTTCGGGCGCACCGGTAGTTCAGG 

A A 

4301 BGLI, 4308 BALI, 

LeuThrGluArgLeuTyrValGlyGlyProLeuThrAsnSerArgGlyGluAsnCysGly 
4 322 CTCACCGAGAGGCTTTATGTTGGGGGCCCTCTTACCAATTCAAGGGGGGAGAACTGCGGC 
GAGTGGCTCTCCGAAATACAACCCCCGGGAGAATGGTTAAGTTCCCCCCTCTTGACGCCG 

A 

4 34 5 APAI, 

TyrArgArgCysArgAlaSerGlyValLeuThrThrSerCysGlyAsnThrLeuThrCys 
4 382 TATCGCAGGTGCCGCGCGAGCGGCGTACTGACAACTAGCTGTGGTAACACCCTCACTTGC 
ATAGCGTCCACGGCGCGCTCGCCGCATGACTGTTGATCGACACCATTGTGGGAGTGAACG 
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TyrlleLysAlaArgAlaAlaCysArgAlaAlaGlyLeuGlnAspCysThrMerLeuVal 
4 442 TACATCAAGGCCCGGGCAGCCT5TCGAGCCGCAGGGCTCCAGGACTGCACCATGCTCGTG 
ATGTAGTTCCGGGCCCGTCGGACAGCTCGGCGTCCCGAGGTCCTGACGTGGTACGAGCAC 

4 4 52 SMAI XMAI, 

CysGlyAspAspLeuValVallleCysGluSerAlaGlyValGlnGluAspAlaAlaSer 
4 502 TGTGGCGACGACTTAGTCGTTATCTGTGAAAGCGCGGGGGTCCAGGAGGACGCGGCGAGC 
ACACCGCTGCTGAATCAGCAATAGACACTTTCGCGCCCCCAGGTCCTCCTGCGCCGCTCG 

A /S 

450S DRDI, 4511 TTH3I , 

LeuArgAlaPheThrGluAlaMetThrArgTyrSerAlaProProGlyAspProProGir. 
4 562 CTGAGAGCCTTCACGGAGGCTATGACCAGGTACTCCGCCCCCCCTGGGGACCCCCCACAA 
GACTCTCGGAAGTGCCTCCGATACTGGTCCATGAGGCGGGGGGGACCCCTGGGGGGTGTT 

ProGluTyrAspLeuGluLeuIleThrSerCysSerSerAsnValSerValAlaKisAsp 
4 622 CCAGAATACGACTTGGAGCTCATAACATCATGCTCCTCCAACGTGTCAGTCGCCCACGAC 
GGTCTTATGCTGAACCTCGAGTATTGTAGTACGAGGAGGTTGCACAGTCAGCGGGTGCTG 

A 

4 637 SACI, 

GlyAlaGlyLysArgValTyrTyrLeuThrArgAspProThrThrProLeuAlaArgAla 
4 682 GGCGCTGGAAAGAGGGTCTACTACCTCACCCGTGACCCTACAACCCCCCTCGCGAGAGCT 
CCGCGACCTTTCTCCCAGATGATGGAGTGGGCACTGGGATGTTGGGGGGAGCGCTCTCGA 

4731 NRUI, 

AlaTrpGluThrAlaArgHisThrProValAsnSerTrpLeuGlyAsnllelleMetPhe 
47 4 2 GCGTGGGAGACAGCAAGACACACTCCAGTCAATTCCTGGCTAGGCAACATAATCATGTTT 
CGCACCCTCTGTCGTTCTGTGTGAGGTCAGTTAAGGACCGATCCGTTGTATTAGTACAAA 

AlaProThrLeuTrpAlaArgMetlleLeuMetThrHisPhePheSerValLeuIleAla 
4 802 GCCCCCACACTGTGGGCGAGGATGATACTGATGACCCATTTCTTTAGCGTCCTTATAGCC 
CGGGGGTGTGACACCCGCTCCTACTATGACTACTGGGTAAAGAAATCGCAGGAATATCGG 

A A 

4806 PFLM1 , 4807 DRA3, 

ArgAspGlnLeuGluGlnAlaLeuAspCysGluIleTyrGlyAlaCysTyrSerlleGlu 
4 8 62 AGGGACCAGCTTGAACAGGCCCTCGATTGCGAGATCTACGGGGCCTGCTACTCCATAGAA 
TCCCTGGTCGAACTTGTCCGGGAGCTAACGCTCTAGATGCCCCGGACGATGAGGTATCTT 



4 8 93 BGL2 , 

ProLeuAspLeuProProIlelleGlnArgLeuHisGlyLeuSerAiaPheSerLeuHis 
4 922 CCACTGGATCTACCTCCAATCATTCAAAGACTCCATGGCCTCAGCGCATTTTCACTCCAC 
GGTGACCTAGATGGAGGTTAGTAAGTTTCTGAGGTACCGGAGTCGCGTAAAAGTGAGGTG 

A 

4954 NCOI, 

SerTyrSerProGlyGluIleAsnArgValAlaAlaCysLeuArgLysLeuGlyValPro 
4 982 AGTTACTCTCCAGGTGAAATCAATAGGGTGGCCGCATGCCTCAGAAAACTTGGGGTACCG 
TCAATGAGAGGTCCACTTTAGTTATCCCACCGGCGTACGGAGTCTTTTGAACCCCATGGC 



A 



A 



5015 SPHI, 5035 KPNI, 



ProLeuArgAlaTrpArgHisArgAlaArgSerValArgAlaArgLeuLeuAlaArgGly 
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5042 CCCTTGCGAGCTTGGAGACACCGGGCCCGGAGCGTCCGCGCTAGGCTTCTGGCCAGAGGA 
GGGAACGCTCGAACCTCTGTGGCCCGGGCCTCGCAGGCGCGATCCGAAGACCGGTCTCCT 

A A 

5064 APAI, 5091 BALI, 

GlyArgAlaAlalleCysGlyLysTyrLeuPheAsnTrpAlaValArgThrLysLeuLys 
5102 GGCAGGGCTGCCATATGTGGCAAGTACCTCTTCAACTGGGCAGTAAGAACAAAGCTCAAA 
CCGTCCCGACGGTATACACCGTTCATGGAGAAGTTGACCCGTCATTCTTGTTTCGAGTTT 

A 

5113 NDEI, 

LeuThrProIleAlaAlaAlaGlyGlnLeuAspLeuSerGlyTrpPheThrAlaGlyTyr 
5162 CTCACTCCAATAGCGGCCGCTGGCCAGCTGGACTTGTCCGGCTGGTTCACGGCTGGCTAC 
GAGTGAGGTTATCGCCGGCGACCGGTCGACCTGAACAGGCCGACCAAGTGCCGACCGATG 

A A A A 

5174 NOT I, 5175 EAG1 XMA3, 5182 BALI , 5186 PVU2, 

SerGlyGlyAspIleTyrHisSerValSerHisAlaArgProArgTrpIleTrpPheCys 
5222 AGCGGGGGAGACATTTATCACAGCGTGTCTCATGCCCGGCCCCGCTGGATCTGGTTTTGC 
T CGC C CCCTCTGTAAATAGTGTCGCACAGAGTACGGGCCGGGGCG AC CT AG ACCAAAACG 

A 

5240 DRA3, 

LeuLeuLeuLeuAlaAlaGlyValGlylleTyrLeuLeuProAsnArgOP 
5282 CTACTCCTGCTTGCTGCAGGGGTAGGCATCTACCTCCTCCCCAACCGATGAATAGTCGAC 
GATGAGGACGAACGACGTCCCCATCCGTAGATGGAGGAGGGGTTGGCTACTTATCAGCTG 



5295 PSTI, 5336 SALI, 
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MetAlaAlaTyrAlaAlaGlnGlyTyrLysValLeuValLeuAsn 
2 AGCTTACAAAACAAAATGGCTGCATATGCAGCTCAGGGCTATAAGGTGCTAGTACTCAAC 
TCGAATGTTTTGTTTTACCGACGTATACGTCGAGTCCCGATATTCCACGATCATGAGTTG 

1 HIND3, 24 NDEI, 52 SCAI, 

ProSerValAlaAlaThrLeuGlyPheGlyAlaTyrMetSerLysAIaHisGlylleAsp 
62 CCCTCTGTTGCTGCAACACTGGGCTTTGGTGCTTACATGTCCAAGGCTCATGGGATCGAT 
GGGAGACAACGACGTTGTGACCCGAAACCACGAATGTACAGGTTCCGAGTACCCTAGCTA 

116 CLAI, 

ProAsnlleArgThrGlyValArgThrlleThrThrGlySerProIleThrTyrSerThr 
122 CCTAACATCAGGACCGGGGTGAGAACAATTACCACTGGCAGCCCCATCACGTACTCCACC 
GGATTGTAGTCCTGGCCCCACTCTTGTTAATGGTGACCGTCGGGGTAGTGCATGAGGTGG 

TyrGlyLysPheLeuAlaAspGlyGlyCysSerGlyGlyAlaTyrAspIlellelleCys 
182 TACGGCAAGTTCCTTGCCGACGGCGGGTGCTCGGGGGGCGCTTATGACATAATAATTTGT 
ATGCCGTTCAAGGAACGGCTGCCGCCCACGAGCCCCCCGCGAATACTGTATTATTAAACA 

AspGluCysHisSerThrAspAlaThrSerlleLeuGlylleGlyThrValLeuAspGln 
24 2 GACGAGTGCCACTCCACGGATGCCACATCCATCTTGGGCATTGGCACTGTCCTTGACCAA 
CTGCTCACGGTGAGGTGCCTACGGTGTAGGTAGAACCCGTAACCGTGACAGGAACTGGTT 

AlaGluThrAlaGlyAlaArgLeuValValLeuAlaThrAlaThrProProGlySerVal 
302 GCAGAGACTGCGGGGGCGAGACTGGTTGTGCTCGCCACCGCCACCCCTCCGGGCTCCGTC 
CGTCTCTGACGCCCCCGCTCTGACCAACACGAGCGGTGGCGGTGGGGAGGCCCGAGGCAG 

303 ALWN1, 

ThrValProHisProAsnlleGluGluValAlaLeuSerThrThrGlyGluIleProPhe 
362 ACTGTGCCCCATCCCAACATCGAGGAGGTTGCTCTGTCCACCACCGGAGAGATCCCTTTT 
TGACACGGGGTAGGGTTGTAGCTCCTCCAACGAGACAGGTGGTGGCCTCTCTAGGGAAAA 

TyrGlyLysAlalleProLeuGluVallleLysGlyGlyArgHisLeuIlePheCysHis 
422 TACGGCAAGGCTATCCCCCTCGAAGTAATCAAGGGGGGGAGACATCTCATCTTCTGTCAT 
ATGCCGTTCCGATAGGGGGAGCTTCATTAGTTCCCCCCCTCTGTAGAGTAGAAGACAGTA 
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SerLysLysLysCysAspGluLeuAlaAlaLysLeuValAlaLeuGlylleAsnAlaVal 
4 82 TCAAAGAAGAAGTGCGACGAACTCGCCGCAAAGCTGGTCGCATTGGGCATCAATGCCGTG 
AGTTTCTTCTTCACGCTGCTTGAGCGGCGTTTCGACCAGCGTAACCCGTAGTTACGGCAC 

AlaTyrTyrArgGlyLeiiAspValSerVallleProThrSerGlyAspValValValVal 
542 GCCTACTACCGCGGTCTTGACGTGTCCGTCATCCCGACCAGCGGCGATGTTGTCGTCGTG 
CGGATGATGGCGCCAGAACTGCACAGGCAGTAGGGCTGGTCGCCGCTACAACAGCAGCAC 

A A 

550 SAC2, 560 DRD1, 

AlaThrAspAlaLeuMetThrGlyTyrThrGlyAspPheAspSerVallleAspCysAsn 
602 GCAACCGATGCCCTCATGACCGGCTATACCGGCGACTTCGACTCGGTGATAGACTGCAAT 
CGT-TGGCTACGGGAGTACTGGCCGATATGGCCGCTGAAGCTGAGCCACTATCTGACGTTA 

A 

615 BSPH1, 

ThrCysValThrGlnThrValAspPheSerLeuAspProThrPheThrlleGluThrlle 
662 ACGTGTGTCACCCAGACAGTCGATTTCAGCCTTGACCCTACCTTCACCATTGAGACAATC 
TGCACACAGTGGGTCTGTCAGCTAAAGTCGGAACTGGGATGGAAGTGGTAACTCTGTTAG 

ThrLeuProGlnAspAlaValSerArgThrGlnArgArgGlyArgThrGlyArgGlyLys 
722 ACGCTCCCCCAAGATGCTGTCTCCCGCACTCAACGTCGGGGCAGGACTGGCAGGGGGAAG 
TGCGAGGGGGTTCTACGACAGAGGGCGTGAGTTGCAGCCCCGTCCTGACCGTCCCCCTTC 

ProGlylleTyrArgPheValAlaProGlyGluArgProSerGlyMetPheAspSerSer 
732 CCAGGCATCTACAGATTTGTGGCACCGGGGGAGCGCCCCTCCGGCATGTTCGACTCGTCC 
GGTCCGTAGATGTCTAAACACCGTGGCCCCCTCGCGGGGAGGCCGTACAAGCTGAGCAGG 

A A 

816 BGLI r 833 DRD1 , 

ValLeuCysGluCysTyrAspAlaGlyCysAlaTrpTyrGluLeuThrProAlaGluThr 
84 2 GTCCTCTGTGAGTGCTATGACGCAGGCTGTGCTTGGTATGAGCTCACGCCCGCCGAGACT 
CAGGAGACACTCACGATACTGCGTCCGACACGAACCATACTCGA5TGCGGGCGGCTCTGA 

A 

881 SACI, 

ThrValArgLeuArgAlaTyrMetAsnThrProGlyLeuProValCysGlnAspHisLeu 
902 ACAGTTAGGCTACGAGCGTACATGAACACCCCGGGGCTTCCCGTGTGCCAGGACCATCTT 
TGTCAATCCGATGCTCGCATGTACTTGTGGGGCCCCGAAGGGCACACGGTCCTGGTAGAA 

A 

931 SMAI XMAI, 

GluPheTrpGluGlyValPheThrGlyLeuThrHisIleAspAlaHisPheLeuSerGln 
962 GAATTTTGGGAGGGCGTCTTTACAGGCCTCACTCATATAGATGCCCACTTTCTATCCCAG 
CTTAAAACCCTCCCGCAGAAATGTCCGGAGTGAGTATATCTACGGGTGAAAGATAGGGTC 

A 

985 STUI, 

ThrLysGlnSerGlyGluAsnLeuProTyrLeuValAlaTyrGlnAlaThrValCysAla 
1022 ACAAAGCAGAGTGGGGAGAACCTTCCTTACCTGGTAGCGTACCAAGCCACCGTGTGCGCT 
TGTTTCGTCTCACCCCTCTTGGAAGGAATGGACCATCGCATGGTTCGGTGGCACACGCGA 

1069 DRA3, 

ArgAlaGlnAlaProProProSerTrpAspGlnMetTrpLysCysLeuIleArgLeuLys 
1082 AGGGCTCAAGCCCCTCCCCCATCGTGGGACCAGATGTGGAAGTGTTTGATTCGCCTCAAG 
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TCCCGAGTTCGGGGAGGGGGTAGCACCCTGGTCTACACCTTCACAAACTASGCGGAGTrc 

ProThrLeuHisGlyProThrProLeuLeuTyrArgLeuGlyAlaValGlnAsnGluIle 
1142 CCCACCCTCCATGGGCCAACACCCCTGCTATACAGACTGGGCGCTGTTCAGAATGAAATC 
GGGTGGGAGGTACCCGGTTGTGGGGACGATATGTCTGACCCGCGACAAGTCTTACTTTAG 

1 A 

1150 NCOI, 

ThrLeuThrHisProVaiThrLysTyrlleMetThrCysMetSerAlaAspLeuGluVai 
1202 ACCCTGACGCACCCAGTCACCAAATACATCATGACATGCATGTCGGCCGACCTGGAGGTC 
TGGGACTGCGTGGGTCAGTGGTTTATGTAGTACTGTACGTACAGCCGGCTGGACCTCCAG 

A A A A A 

1230 BSPH1, 1234 DRD1, 1237 AVA3 , 1245 EAG1 XMA3, 1250 DRD1 , 



ValThrSerThrTrpValLeuValGlyGlyValLeuAlaAlaLeuAlaAlaTyrCysLeu 
12 62 GTCACGAGCACCTGGGTGCTCGTTGGCGGCGTCCTGGCTGCTTTGGCCGCGTATTGCCTG 
CAGTGCTCGTGGACCCACGAGCAACCGCCGCAGGACCGACGAAACCGGCGCATAACGGAC 

SerThrGlyCysValVallleVaiGlyArgValValLeuSerGlyLysProAlallelle 
1322 TCAACAGGCTGCGTGGTCATAGTGGGCAGGGTCGTCTTGTCCGGGAAGCCGGCAATCATA 
AGTTGTCCGACGCACCAGTATCACCCGTCCCAGCAGAACAGGCCCTTCGGCCGTTAGTAT 

1369 NAEI, 

ProAspArgGluValLeuTyrArgGluPheAspGluMetGluGluCysSerGlnHisLeu 
1332 CCTGACAGGGAAGTCCTCTACCGAGAGTTCGATGAGATGGAAGAGTGCTCTCAGCACTTA 
GGACTGTCCCTTCAGGAGATGGCTCTCAAGCTACTCTACCTTCTCACGAGAGTCGTGAAT 

A 

1385 DRD1 , 

ProTyrlleGluGlnGlyMetMetLeuAlaGluGlnPheLysGlnLysAlaLeuGlyLeu 
14 4 2 CCGTACATCGAGCAAGGGATGATGCTCGCCGAGCAGTTCAAGCAGAAGGCCCTCGGCCTC 
GGCATGTAGCTCGTTCCCTACTACGAGCGGCTCGTCAAGTTCGTCTTCCGGGAGCCGGAG 

LeuGlnThrAlaSerArgGlnAlaGluVallleAlaProAlaValGlnThrAsnTrpGIn 
1502 CTGCAGACCGCGTCCCGTCAGGCAGAGGTTATCGCCCCTGCTGTCCAGACCAACTGGCAA 
GACGTCTGGCGCAGGGCAGTCCGTCTCCAATAGCGGGGACGACAGGTCTGGTTGACCGTT 

A A 

1502 PSTI, 1507 TTH3I , 

LysLeuGluThrPheTrpAlaLysHisMetTrpAsnPhelleSerGlylleGlnTyrLeu 
1562 AAACTCGAGACCTTCTGGGCGAAGCATATGTGGAACTTCATCAGTGGGATACAATACTTG 
TTTGAGCTCTGGAAGACCCGCTTCGTATACACCTTGAAGTAGTCACCCTATGTTATGAAC 

A A 

1565 XHOI, 1586 NDEI, 

AlaGlyLeuSerThrLeuProGlyAsnProAlalleAlaSerLeuMetAlaPheThrAla 
1622 GCGGGCTTGTCAACGCTGCCTGGTAACCCCGCCATTGCTTCATTGATGGCTTTTACAGCT 
CGCCCGAACAGTTGCGACGGACCATTGGGGCGGTAACGAAGTAACTACCGAAAATGTCGA 

A A 

1643 BSTE2, 1677 ALWN1 PVU2 , 

AlaValThrSerProLeuThrThrSerGlnThrLeuLeuPheAsnlleLeuGlyGlyTrp 
1682 GCTGTCACCAGCCCACTAACCACTAGCCAAACCCTCCTCTTCAACATATTGGGGGGGTGG 
CGACAGTGGTCGGGTGATTGGTGATCGGTTTGGGAGGAGAAGTTGTATAACCCCCCCACC 
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ValAlaAlaGlnLeuAlaAlaProGlyAlaAlaThrAlaPheValGlyAlaGiyLeuAIa 

17 42 GTGGCTGCCCAGCTCGCCGCCCCCGGTGCCGCTACTGCCTTTGTGGGCGCTGGCTTAGCT 

CACCGACGGGTCGAGCGGCGGGGGCCACGGCGATGACGGAAACACCCGCGACCGAATCGA 

A 

1794 ESP1, 

GlyAlaAlalleGlySerValGlyLeuGlyLysValLeuIleAspIleLeuAlaGlyTyr 

18 02 GGCGCCGCCATCGGCAGTGTTGGACTGGGGAAGGTCCTCATAGACATCCTTGCAGGGTAT 

CCGCGGCGGTAGCCGTCACAACCTGACCCCTTCCAGGAGTATCTGTAGGAACGTCCCATA 

1802 KAS1 NARI, 

GlyAlaGlyValAlaGlyAlaLeuValAlaPheLysIleMetSerGlyGluValProSer 
18 62 GGCGCGGGCGTGGCGGGAGCTCTTGTGGCATTCAAGATCATGAGCGGTGAGGTCCCCTCC 
CCGCGCCCGCACCGCCCTCGAGAACACCGTAAGTTCTAGTACTCGCCACTCCAGGGGAGG 

A A 

1878 SAC I, 1899 BSPH1, 

ThrGluAspLeuValAsnLeuLeuProAlalleLeuSerProGlyAIaLeuValValGly 
1922 ACGGAGGACCTGGTCAATCTACTGCCCGCCATCCTCTCGCCCGGAGCCCTCGTAGTCGGC 

TGCCTCCTGGACCAGTTAGATGACGGGCGGTAGGAGAGCGGGCCTCGGGAGCATCAGCCG 

a 

1928 TTH3I , 

ValValCysAlaAlalleLeuArgArgHisValGlyProGlyGluGlyAlaValGlnTrp 
1982 GTGGTCTGTGCAGCAATACTGCGCCGGCACGTTGGCCCGGGCGAGGGGGCAGTGCAGTGG 
CACCAGACACGTCGTTATGACGCGGCCGTGCAACCGGGCCCGCTCCCCCGTCACGTCACC 

A A 

2004 NAEI, 2017 SMAI XMAI, 

MerAsnArgLeuIleAlaPheAlaSerArgGlyAsnHisValSerProThrHisTyrVal 
204 2 ATGAACCGGCTGATAGCCTTCGCCTCCCGGGGGAACCATGTTTCCCCCACGCACTACGTG 
TACTTGGCCGACTATCGGAAGCGGAGGGCCCCCTTGGTACAAAGGGGGTGCGTGATGCAC 

A A 

2067 SMAI XMAI, 2093 DRA3, 

ProGluSerAspAlaAlaAlaArgValThrAlalleLeuSerSerLeuThrValThrGIn 
2102 CCGGAGAGCGATGCAGCTGCCCGCGTCACTGCCATACTCAGCAGCCTCACTGTAACCCAG 
GGCCTCTCGCTACGTCGACGGGCGCAGTGACGGTATGAGTCGTCGGAGTGACATTGGGTC 

A A 

2115 PVU2 , 2159 ALWN1, 

LeuLeuArgArgLeuHisGlnTrpIleSerSerGluCysThrThrProCysSerGlySer 
2162 CTCCTGAGGCGACTGCACCAGTGGATAAGCTCGGAGTGTACCACTCCATGCTCCGGTTCC 
GAGGACTCCGCTGACGTGGTCACCTATTCGAGCCTCACATGGTGAGGTACGAGGCCAAGG 

A A 

2164 MST2, 2220 ECON1, 

TrpLeuArgAspIleTrpAspTrpIleCysGluValLeuSerAspPheLysThrTrpLeu 
2222 TGGCTAAGGGACATCTGGGACTGGATATGCGAGGTGTTGAGCGACTTTAAGACCTGGCTA 
ACCGATTCCCTGTAGACCCTGACCTATACGCTCCACAACTCGCTGAAATTCTGGACCGAT 

LysAlaLysLeuMetProGlnLeuProGlylleProPheValSerCysGlnArgGlyTyr 
2282 AAAGCTAAGCTCATGCCACAGCTGCCTGGGATCCCCTTTGTGTCCTGCCAGCGCGGGTAT 
TTTCGATTCGAGTACGGTGTCGACGGACCCTAGGGGAAACACAGGACGGTCGCGCCCATA 



2285 ESP1, 2300 PVU2, 2310 BAMHI, 
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LysGlyValTrpArgGlyAspGlylleMetHisThrArgCysHisCysGlyAlaGiulie 
2342 AAGGGGGTCTGGCGAGGGGAtGGCATCATGCACACTCGCTGCCACTGTGGAGCTGAGATC 
TTCCCCCAGACCGCTCCCCTGCCGTAGTACGTGTGAGCGACGGTGACACCTCGACTCTAG 

ThrGlyHisValLysAsnGlyThrMetArglleValGlyProArgThrCysArgAsnMet 
24 02 ACTGGACATGTCAAAAACGGGACGATGAGGATCGTCGGTCCTAGGACCTGCAGGAACATG 
TGACCTGTACAGTTTTTGCCCTGCTACTCCTAGCAGCCAGGATCCTGGACGTCCTTGTAC 

A A ^ 

2425 BSAB1, 2441 AVR2, 2448 SSE83871, 2449 PSTI, 

TroSerGlyThrPheProIleAsnAlaTyrThrThrGlyProCysThrProLeuProAIa 
24 62 TGGAGTGGGACCTTCCCCATTAATGCCTACACCACGGGCCCCTGTACCCCCCTTCCTGCG 
ACCTCACCCTGGAAGGGGTAATTACGGATGTGGTGCCCGGGGACATGGGGGGAAGGACGC 

A A 

2480 ASE1, 2497 APAI, 

ProAsnTyrThrPheAlaLeuTrpArgValSerAlaGluGluTyrValGluIleArgGln 
2522 CCGAACTACACGTTCGCGCTATGGAGGGTGTCTGCAGAGGAATACGTGGAGATAAGGCAG 
GGCTTGATGTGCAAGCGCGATACCTCCCACAGACGTCTCCTTATGCACCTCTATTCCGTC 

A 

2553 PSTI, 

ValGlyAspPheHisTyrValThrGlyMetThrThrAspAsnLeuLysCysProCysGln 
2582 GTGGGGGACTTCCACTACGTGACGGGTATGACTACTGACAATCTTAAATGCCCGTGCCAG 
CACCCCCTGAAGGTGATGCACTGCCCATACTGATGACTGTTAGAATTTACGGGCACGGTC 

2594 DRA3, 

ValProSerProGluPhePheThrGluLeuAspGlyValArgLeuHisArgPheAlaPro 
2642 GTCCCATCGCCCGAATTTTTCACAGAATTGGACGGGGTGCGCCTACATAGGTTTGCGCCC 
CAGGGTAGCGGGCTTAAAAAGTGTCTTAACCTGCCCCACGCGGATGTATCCAAACGCGGG 

ProCysLysProLeuLeuArgGluGluValSerPheArgValGlyLeuHisGIuTyrPro 
2 7 C 2 CCCTGCAAGCCCTTGCTGCGGGAGGAGGTATCATTCAGAGTAGGACTCCACGAATACCCG 
GGGACGTTCGGGAACGACGCCCTCCTCCATAGTAAGTCTCATCCTGAGGTGCTTATGGGC 

A 

2757 HGIE2, 

ValGlySerGlnLeuProCysGluProGluProAspValAlaValLeuThrSerMetLeu 

27 62 GTAGGGTCGCAATTACCTTGCGAGCCCGAACCGGACGTGGCCGTGTTGACGTCCATGCTC 

CATCCCAGCGTTAATGGAACGCTCGGGCTTGGCCTGCACCGGCACAACTGCAGGTACGAG 

A. 

2809 AAT2 , 

ThrAspProSerHisIleThrAlaGliiAlaAlaGlyArgArgLeuAlaArgGlySerPro 
2822 ACTGATCCCTCCCATATAACAGCAGAGGCGGCCGGGCGAAGGTTGGCGAGGGGATCACCC 
TGACTAGGGAGGGTATATTGTCGTCTCCGCCGGCCCGCTTCCAACCGCTCCCCTAGTGGG 

A 

2850 EAG1 XMA3, 

ProSerValAlaSerSerSerAlaSerGlnLeuSerAlaProSerLeuLysAlaThrCys 

28 82 CCCTCTGTGGCCAGCTCCTCGGCTAGCCAGCTATCCGCTCCATCTCTCAAGGCAACTTGC 

GGGAGACACCGGTCGAGGAGCCGATCGGTCGATAGGCGAGGTAGAGAGTTCCGTTGAACG 

A A 

2889 BALI , 2903 NHEI , 
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ThrAlaAsnHisAspSerProAspAlaGluLeuIleGluAlaAsnLeuLeuTrpArgGlr. 
2942 ACCGCTAACCATGACTCCCCTGATGCTGAGCTCATAGAGGCCAACCTCCTATGGAGGCAG 
TGGCGATTGGTACTGAGGGGACTACGACTCGAGTATCTCCGGTTGGAGGATACCTCCGTC 

A A 

, 2966 ESP1, 2969 SACI, 

GluMetGlyGl yAsnlleThrArgValGluSerGluAsnLys Valval I leLeuAspSer 
3002 GAGATGGGCGGCAACATCACCAGGGTTGAGTCAGAAAACAAAGTGGTGATTCTGGACTCC 
CTCTACCCGCCGTTGTAGTGGTCCCAACTCAGTCTTTTGTTTCACCACTAAGAGCTGAGG 

PheAspProLeuValAlaGluGluAspGluArgGluIleSerValProAlaGiuIIeLeu 
30 62 TTCGATCCGCTTGTGGCGGAGGAGGACGAGCGGGAGATCTCCGTACCCGCAGAAATCCTG 
AAGCTAGGCGAACACCGCCTCCTCCTGCTCGCCCTCTAGAGGCATGGGCGTCTTTAGGAC 

30'96 BGL2 , 

ArgLysSerArgArgPheAlaGlnAlaLeuProValTrpAlaArgProAspTyrAsnPro 
3122 CGGAAGTCTCGGAGATTCGCCCAGGCCCTGCCCGTTTGGGCGCGGCCGGACTATAACCCC 
GCCTTCAGAGCCTCTAAGCGGGTCCGGGACGGGCAAACCCGCGCCGGCCTGATATTGGGG 

a y\ 

3143 ALWN1, 3164 EAG1 XMA3, 

ProLeuValGluThrTrpLysLysProAspTyrGluProProValValHisGlyCysPro 
3182 CCGCTAGTGGAGACGTGGAAAAAGCCCGACTACGAACCACCTGTGGTCCATGGCTGCCCG 
GGCGATCACCTCTGCACCTTTTTCGGGCTGATGCTTGGTGGACACCAGGTACCGACGGGC 

A A 

3217 HGIE2, 3229 NCOI, 

LeuProProProLysSerProProValProProProArgLysLysArgThrValValLeu 
32 4 2 CTTCCACCTCCAAAGTCCCCTCCTGTGCCTCCGCCTCGGAAGAAGCGGACGGTGGTCCTC 
GAAGGTGGAGGTTTCAGGGGAGGACACGGAGGCGGAGCCTTCTTCGCCTGCCACCAGGAG 

ThrGluSerThrLeuSerThrAlaLeuAlaGluLeuAlaThrArgSerPheGlySerSer 
3302 ACTGAATCAACCCTATCTACTGCCTTGGCCGAGCTCGCCACCAGAAGCTTTGGCAGCTCC 
TGACTTAGTTGGGATAGATGACGGAACCGGCTCGAGCGGTGGTCTTCGAAACCGTCGAGG 

A A 

3332 SACI, 3346 HIND3, 

SerTxhrSerGlylleThrGlyAspAsnThrThrThrSerSerGluProAlaProSerGly 
3362 TCAACTTCCGGCATTACGGGCGACAATACGACAACATCCTCTGAGCCCGCCCCTTCTGGC 
AGTTGAAGGCCGTAATGCCCGCTGTTATGCTGTTGTAGGAGACTCGGGCGGGGAAGACCG 

CysProProAspSerAspAlaGluSerTyrSerSerMetProProLeuGluGlyGluPro 
3422 TGCCCCCCCGACTCCGACGCTGAGTCCTATTCCTCCATGCCCCCCCTGGAGGGGGAGCCT 
ACGGGGGGGCTGAGGCTGCGACTCAGGATAAGGAGGTACGGGGGGGACCTCCCCCTCGGA 

A 

3437 EAM11051, 

GlyAspProAspLeuSerAspGlySerTrpSerThrValSerSerGluAlaAsnAlaGlu 
34 82 GGGGATCCGGATCTTAGCGACGGGTCATGGTCAACGGTCAGTAGTGAGGCCAACGCGGAG 
CCCCTAGGCCTAGAATCGCTGCCCAGTACCAGTTGCCAGTCATCACTCCGGTTGCGCCTC 

A A A 

3484 BAMHI , 3485 BSAB1 , 3487 BSPE1, 

AspValValCysCysSerMetSerTyrSerTrpThrGlyAlaLeuValThrProCysAla 
3542 GATGTCGTGTGCTGCTCAATGTCTTACTCTTGGACAGGCGCACTCGTCACCCCGTGCGCC 
CTACAGCACACGACGAGTTACAGAATGAGAACCTGTCCGCGTGAGCAGTGGGGCACGCGG 
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3589 DRA3, 3600 SAC2, " 

1 AlaGluGluGlnLysLeuProIleAsnAlaLeuSerAsnSerLeuLeuArgHisHisAsn 
3602 GCGGAAGAACAGAAACTGCCCATCAATGCACTAAGCAACTCGTTGCTACGTCACCACAAT 
CGCCTTCTTGTCTTTGACGGGTAGTTACGTGATTCGTTGAGCAACGATGCAGTGGTGTTA 
a * 

3611 ALWN1, 3655 PFLM1, 

LeuValTyrSerThrThrSerArgSerAlaCysGlnArgGlnLysLysValThrPheAsp 

3662 TTGGTGTATTCCACCACCTCACGCAGTGCTTGCCAAAGGCAGAAGAAAGTCACATTTGAC 
AACCACATAAGGTGGTGGAGTGCGTCACGAACGGTTTCCGTCTTCTTTCAGTGTAAAC7G 

A. 

3681 DRA3 , 

ArgLeuGlnValLeuAspSerHisTyrGlnAspValLeuLysGluValLysAlaAlaAla 
37 22 AGACTGCAAGTTCTGGACAGCCATTACCAGGACGTACTCAAGGAGGTTAAAGCAGCGGCG 
TCTGACGTTCAAGACCTGTCGGTAATGGTCCTGCATGAGTTCCTCCAATTTCGTCGCCGC 

SerLysValLysAlaAsnLeuLeuSerValGluGluAlaCysSerLeuThrProProHis 
37 82 TCAAAAGTGAAGGCTAACTTGCTATCCGTAGAGGAAGCTTGCAGCCTGACGCCCCCACAC 
AGTTTTCACTTCCGATTGAACGATAGGCATCTCCTTCGAACGTCGGACTGCGGGGGTGTG 

3816 HIND3, 

SerAlaLysSerLysPheGlyTyrGlyAlaLysAspValArgCysHisAlaArgLysAla 
3842 TCAGCCAAATCCAAGTTTGGTTATGGGGCAAAAGACGTCCGTTGCCATGCCAGAAAGGCC 
AGTCGGTTTAGGTTCAAACCAATACCCCGTTTTCTGCAGGCAACGGTACGGTCTTTCCGG 

A W 

3875 AAT2, 3890 BGLI, 

ValThrHisIleAsnSerValTrpLysAspLeuLeuGluAspAsnValThrProIleAsp 
3902 GTAACCCACATCAACTCCGTGTGGAAAGACCTTCTGGAAGACAATGTAACACCAATAGAC 
CATTGGGTGTAGTTGAGGCACACCTTTCTGGAAGACCTTCTGTTACATTGTGGTTATCTG 

ThrThrlleMetAlaLysAsnGluValPheCysValGlnProGluLysGlyGlyArgLys 
3962 ACTACCATCATGGCTAAGAACGAGGTTTTCTGCGTTCAGCCTGAGAAGGGGGGTCGTAAG 
TGATGGTAGTACCGATTCTTGCTCCAAAAGACGCAAGTCGGACTCTTCCCCCCAGCATTC 

ProAlaArgLeuIleValPheProAspLeuGlyValArgValCysGluLysMetAlaLeu 
4 022 CCAGCTCGTCTCATCGTGTTCCCCGATCTGGGCGTGCGCGTGTGCGAAAAGATGGCTTTG 
GGTCGAGCAGAGTAGCACAAGGGGCTAGACCCGCACGCGCACACGCTTTTCTACCGAAAC 

TyrAspValValThrLysLeuProLeuAlaValMetGlySerSerTyrGlyPheGlnTyr 
4 082 TACGACGTGGTTACAAAGCTCCCCTTGGCCGTGATGGGAAGCTCCTACGGATTCCAATAC 
ATGCTGCACCAATGTTTCGAGGGGAACCGGCACTACCCTTCGAGGATGCCTAAGGTTATG 

SerProGlyGlnArgValGluPheLeuValGlnAlaTrpLysSerLysLysThrProMet 
4142 TCACCAGGACAGCGGGTTGAATTCCTCGTGCAAGCGTGGAAGTCCAAGAAAACCCCAATG 
AGTGGTCCTGTCGCCCAACTTAAGGAGCACGTTCGCACCTTCAGGTTCTTTTGGGGTTAC 

A 

4160 ECORI, 

GlyPheSerTyrAspThrArgCysPheAspSerThrValThrGluSerAspIleArgThr 
4202 GGGTTCTCGTATGATACCCGCTGCTTTGACTCCACAGTCACTGAGAGCGACATCCGTACG 
CCCAAGAGCATACTATGGGCGACGAAACTGAGGTGTCAGTGACTCTCGCTGTAGGCATGC 
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4229 DRD1, 4236 ALWN1, 

GluGluAlalleTyrGlnCysCysAspLeuAspProGlnAlaArgValAlalieLysSer 
4 262 GAGGAGGCAATCTACCAATGTTGTGACCTCGACCCCCAAGCCCGCGTGGCCATCAAGTCC 
CTCCTCCGTTAGATGGTTACAACACTGGAGCTGGGGGTTCGGGCGCACCGGTAGTTCAGG 

4301 BGLI, 4308 BALI, 

LeuThrGluArgLeuTyrValGlyGlyProLeuThrAsnSerArgGlyGluAsnCysGly 
4 322 CTCACCGAGAGGCTTTATGTTGGGGGCCCTCTTACCAATTCAAGGGGGGAGAACTGCGGC 
GAGTGGCTCTCCGAAATACAACCCCCGGGAGAATGGTTAAGTTCCCCCCTCTTGACGCCG 

4 34 5 APAI, 

TyrArgArgCysArgAlaSerGlyValLeuThrThrSerCysGlyAsnThrLeuThrCys 
4 382 TATCGCAGGTGCCGCGCGAGCGGCGTACTGACAACTAGCTGTGGTAACACCCTCACTTGC 
ATAGCGTCCACGGCGCGCTCGCCGCATGACTGTTGATCGACACCATTGTGGGAGTGAACG 

TyrlleLysAlaArgAlaAlaCysArgAlaAlaGlyLeuGlnAspCysThrMetLeuVal 
4 442 TACATCAAGGCCCGGGCAGCCTGTCGAGCCGCAGGGCTCCAGGACTGCACCATGCTCGTG 
ATGTAGTTCCGGGCCCGTCGGACAGCTCGGCGTCCCGAGGTCCTGACGTGGTACGAGCAC 

4 4 52 SMAI XMAI, 

CysGlyAspAspLeuValVallleCysGluSerAlaGlyValGlnGluAspAlaAlaSer 
4 502 TGTGGCGACGACTTAGTCGTTATCTGTGAAAGCGCGGGGGTCCAGGAGGACGCGGCGAGC 
ACACCGCTGCTGAATCAGCAATAGACACTTTCGCGCCCCCAGGTCCTCCTGCGCCGCTCG 

4508 DRD1, 4511 TTH3I , 

LeuArgAIaPheThrGluAlaMetThrArgTyrSerAlaProProGlyAspProProGln 
4 562 CTGAGAGCCTTCACGGAGGCTATGACCAGGTACTCCGCCCCCCCTGGGGACCCCCCACAA 
GACTCTCGGAAGTGCCTCCGATACTGGTCCATGAGGCGGGGGGGACCCCTGGGGGGTGTT 

ProGluTyrAspLeuGluLeuIleThrSerCysSerSerAsnValSerValAlaHisAsp 
4 622 CCAGAATACGACTTGGAGCTCATAACATCATGCTCCTCCAACGTGTCAGTCGCCCACGAC 
GGTCTTATGCTGAACCTCGAGTATTGTAGTACGAGGAGGTTGCACAGTCAGCGGGTGCTG 

4 637 SAC I, 

GlyAlaGlyLysArgValTyrTyrLeuThrArgAspProThrThrProLeuAlaArgAla 
4 682 GGCGCTGGAAAGAGGGTCTACTACCTCACCCGTGACCCTACAACCCCCCTCGCGAGAGCT 
CCGCGACCTTTCTCCCAGATGATGGAGTGGGCACTGGGATGTTGGGGGGAGCGCTCTCGA 

4731 NRUI, 

AlaTrpGluThrAlaArgHisThrProValAsnSerTrpLeuGlyAsnllelleMetPhe 
4 7 42 GCGTGGGAGACAGCAAGACACACTCCAGTCAATTCCTGGCTAGGCAACATAATCATGTTT 
CGCACCCTCTGTCGTTCTGTGTGAGGTCAGTTAAGGACCGATCCGTTGTATTAGTACAAA 

AlaProThrLeuTrpAlaArgMetlleLeuMetThrHisPhePheSerValLeuIleAla 
4 802 GCCCCCACACTGTGGGCGAGGATGATACTGATGACCCATTTCTTTAGCGTCCTTATAGCC 
CGGGGGTGTGACACCCGCTCCTACTATGACTACTGGGTAAAGAAATCGCAGGAATATCGG 

4806 PFLM1 , 4807 DRA3, 



ArgAspGlnLeuGluGlnAlaLeuAspCysGluIleTyrGlyAlaCysTyrSerlleGlu 
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4862 AGGGACCAGCTTGAACAGGCCCTCGATTGCGAGATCTACGGGGCCTGCTACTCCA7AGAA 

TCCCTGGTCGAACTTGTCCGGGAGCTAACGCTCTAGATGCCCCGGACGATGAGGTATCTT 

a 

4 893 BGL2 , 

ProLeuAspLeuProProIlelleGlnArgLeuHisGlyLeuSerAlaPheSerLeuHis 
4 922 CCACTGGATCTACCTCCAATCATTCAAAGACTCCATGGCCTCAGCGCATTTTCACTCCAC 
GGTGACCTAGATGGAGGTTAGTAAGTTTCTGAGGTACCGGAGTCGCGTAAAAGTGAGGTG 

4954 NCOI, 

SerTyrSerProGlyGluIleAsnArgValAlaAlaCysLeuArgLysLeuGlyValPro 

4 98 2 AGTTACTCTCCAGGTGAAATCAATAGGGTGGCCGCATGCCTCAGAAAACTTGGGGTACCG 

TCAATGAGAGGTCCACTTTAGTTATCCCACCGGCGTACGGAGTCTTTTGAACCCCATGGC 

5015 SPHI, 5035 KPNI, 

ProLeuArgAlaTrpArgHisArgAlaArgSerValArgAlaArgLeuLeuAlaArgGly 

5 042 CCCTTGCGAGCTTGGAGACACCGGGCCCGGAGCGTCCGCGCTAGGCTTCTGGCCAGAGGA 

GGGAACGCTCGAACCTCTGTGGCCCGGGCCTCGCAGGCGCGATCCGAAGACCGGTCTCCT 

a ^ 

5064 APAI, 5091 BALI, 

GlyArgAlaAlalleCysGlyLysTyrLeuPheAsnTrpAlaValArgThrLysLeuLys 
5102 GGCAGGGCTGCCATATGTGGCAAGTACCTCTTCAACTGGGCAGTAAGAACAAAGCTCAAA 
CCGTCCCGACGGTATACACCGTTCATGGAGAAGTTGACCCGTCATTCTTGTTTCGAGTTT 

5113 NDEI , 

LeuThrProIleAlaAlaAlaGlyGlnLeuAspLeuSerGlyTrpPheThrAlaGlyTyr 
5162 CTCACTCCAATAGCGGCCGCTGGCCAGCTGGACTTGTCCGGCTGGTTCACGGCTGGCTAC 
GAGTGAGGTTATCGCCGGCGACCGGTCGACCTGAACAGGCCGACCAAGTGCCGACCGATG 

A A A A 

5174 NOT I, 5175 EAG1 XMA3, 5182 BALI, 5186 PVU2 , 

SerGlyGlyAspIleTyrHisSerValSerHisAlaArgProArgTrpIleTrpPheCys 
5222 AGCGGGGGAGACATTTATCACAGCGTGTCTCATGCCCGGCCCCGCTGGATCTGGTTTTGC 
TCGCCCCCTCTGTAAATAGTGTCGCACAGAGTACGGGCCGGGGCGACCTAGACCAAAACG 

A 

5240 DRA3, 

LeuLeuLeuLeuAlaAlaGlyValGlylleTyrLeuLeuProAsnArgMetSerThrAsn 
" 5282 CTACTCCTGCTTGCTGCAGGGGTAGGCATCTACCTCCTCCCCAACCGAATGAGCACGAAT 
GATGAGGACGAACGACGTCCCCATCCGTAGATGGAGGAGGGGTTGGCTTACTCGTGCTTA 

A 

5295 PSTI, 

ProLysProGlnArgLysThrLysArgAsnThrAsnArgArgProGlnAspValLysPhe 
5342 CCTAAACCTCAAAGAAAGACCAAACGTAACACCAACCGGCGGCCGCAGGACGTCAAGTTC 
GGATTTGGAGTTTCTTTCTGGTTTGCATTGTGGTTGGCCGCCGGCGTCCTGCAGTTCAAG 

A A A A 

5380 NOTI, 5381 EAG1 XMA3, 5390 AAT2 , 5401 SMAI XMAI, 

ProGlyGlyGlyGlnlleValGlyGlyValTyrLeuLeuProArgArgGlyProArgLeu 
54 02 CCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGGCCCTAGATTG 
GGCCCACCGCCAGTCTAGCAACCACCTCAAATGAACAACGGCGCGTCCCCGGGATCTAAC 
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5449 APAI, 

GlyValArgAlaThrArgLysThrSerGluArgSerGlnProArgGlyArgArgGlnPro 
54 62 GGTGTGCGCGCGACGAGAAAGACTTCCGAGCGGTCGCAACCTCGAGGTAGACGTCAGCCT 
' CCACACGCGCGCTGCTCTTTCTGAAGGCTCGCCAGCGTTGGAGCTCCATCTGCAGTCGGA 

A A A ^ 

546"? BSSH2, 5478 XMNI, 5502 XHOI, 5511 AAT2, 

IleProLysAlaArgArgProGluGlyArgThrTrpAlaGlnProGlyTyrProTrpPro 
5522 ATCCCCAAGGCTCGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTACCCTTGGCCC 
TAGGGGTTCCGAGCAGCCGGGCTCCCGTCCTGGACCCGAGTCGGGCCCATGGGAACCGGG 

A AAA 

5546 ALWN1, 5558 ESP1, 5564 SMAI XMAI, 5568 KPNI, 

LeuTyrGlyAsnGluGlyCysGlyTrpAlaGlyTrpLeuLeuSerProArgGlySerArg 
5582 CTCTATGGCAATGAGGGCTGCGGGTGGGCGGGATGGCTCCTGTCTCCCCGTGGCTCTCGG 
GAGATACCG7TAC7CCCGACGCCCACCCGCCCTACCGAGGACAGAGGGGCACCGAGAGCC 

ProSerTrpGlyProThrAspProArgArgArgSerArgAsnLeuGlyLysOC AM 
564 2 CCTAGCTGGGGCCCCACAGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGTAATAGTCG 
GGATCGACCCCGGGGTGTCTGGGGGCCGCATCCAGCGCGTTAAACCCATTCATTATCAGC 

A A 

5650 APAI , 5698 SALI, 



5702 AC 
TG 
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MetAlaAlaTyrAlaAlaGlnGlyTyrLysVaiLeuValLeuAsn 
2 AGCTTACAAAACAAAATGGCTGCATATGCAGCTCAGGGCTATAAGGTGCTAGTACTCAAC 
TCGAATGTTTTGTTTTACCGACGTATACGTCGAGTCCCGATATTCCACGATCATGAGTTG 

1 HIND3, 24 NDEI, 52 SCAI, 

ProSerValAlaAlaThrLeuGlyPheGlyAlaTyrMetSerLysAlaHisGlylleAsp 
62 CCCTCTGTTGCTGCAACACTGGGCTTTGGTGCTTACATGTCCAAGGCTCATGGGATCGAT 
GGGAGACAACGACGTTGTGACCCGAAACCACGAATGTACAGGTTCCGAGTACCCTAGCTA 

116 CLAI, 

ProAsnlleArgThrGlyValArgThrlleThrThrGlySerProIleThrTyrSerThr 
122 CCTAACATCAGGACCGGGGTGAGAACAATTACCACTGGCAGCCCCATCACGTACTCCACC 
GGATTGTAGTCCTGGCCCCACTCTTGTTAATGGTGACCGTCGGGGTAGTGCATGAGGTGG 

TyrGlyLysPheLeuAlaAspGlyGlyCysSerGlyGlyAlaTyrAspIlellelleCys 
182 TACGGCAAGTTCCTTGCCGACGGCGGGTGCTCGGGGGGCGCTTATGACATAATAATTTGT 
ATGCCGTTCAAGGAACGGCTGCCGCCCACGAGCCCCCCGCGAATACTGTATTATTAAACA 

AspGluCysHisSerThrAspAlaThrSerlleLeuGlylleGlyThrValLeuAspGln 
24 2 GACGAGTGCCACTCCACGGATGCCACATCCATCTTGGGCATTGGCACTGTCCTTGACCAA 
CTGCTCACGGTGAGGTGCCTACGGTGTAGGTAGAACCCGTAACCGTGACAGGAACTGGTT 

AlaGluThrAlaGlyAlaArgLeuValValLeuAlaThrAlaThrProProGlySerVal 
302 GCAGAGACTGCGGGGGCGAGACTGGTTGTGCTCGCCACCGCCACCCCTCCGGGCTCCGTC 
CGTCTCTGACGCCCCCGCTCTGACCAACACGAGCGGTGGCGGTGGGGAGGCCCGAGGCAG 

303 ALWN1 , 

ThrValProHisProAsnlleGluGluValAlaLeuSerThrThrGlyGluIleProPhe 
362 ACTGTGCCCCATCCCAACATCGAGGAGGTTGCTCTGTCCACCACCGGAGAGATCCCTTTT 
TGACACGGGGTAGGGTTGTAGCTCCTCCAACGAGACAGGTGGTGGCCTCTCTAGGGAAAA 

TyrGlyLysAlalleProLeuGluVallleLysGlyGlyArgHisLeuIlePheCysHis 
422 TACGGCAAGGCTATCCCCCTCGAAGTAATCAAGGGGGGGAGACATCTCATCTTCTGTCAT 
ATGCCGTTCCGATAGGGGGAGCTTCATTAGTTCCCCCCCTCTGTAGAGTAGAAGACAGTA 

SerLysLysLysCysAspGluLeuAlaAlaLysLeuValAlaLeuGlylleAsnAlaVal 
4 82 TCAAAGAAGAAGTGCGACGAACTCGCCGCAAAGCTGGTCGCATTGGGCATCAATGCCGTG 
AGTTTCTTCTTCACGCTGCTTGAGCGGCGTTTCGACCAGCGTAACCCGTAGTTACGGCAC 

AlaTyrTyrArgGlyLeuAspValSerVallleProThrSerGlyAspValValValVal 
54 2 GCCTACTACCGCGGTCTTGACGTGTCCGTCATCCCGACCAGCGGCGATGTTGTCGTCGTG 
CGGATGATGGCGCCAGAACTGCACAGGCAGTAGGGCTGGTCGCCGCTACAACAGCAGCAC 

550 SAC2, 560 DRD1, 

AlaThrAspAlaLeuMetThrGlyTyrThrGlyAspPheAspSerVallleAspCysAsn 
602 GCAACCGATGCCCTCATGACCGGCTATACCGGCGACTTCGACTCGGTGATAGACTGCAAT 
CGTTGGCTACGGGAGTACTGGCCGATATGGCCGCTGAAGCTGAGCCACTATCTGACGTTA 



615 BSPH1, 
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ThrCysValThrGlnThrValAspPheSerLeuAspProThrPheThrlleGluTnrlie 
662 ACGTGTGTCACCCAGACAGTCGATTTCAGCCTTGACCCTACCTTCACCATTGAGACAATC 
TGCACACAGTGGGTCTGTCAGCTAAAGTCGGAACTGGGATGGAAGTGGTAACTCTGTTAG 

. ThrLeuProGlnAspAlaValSerArgThrGlnArgArgGlyArgThrGlyArgGlyLys 
722 ACGCTCCCCCAAGATGCTGTCTCCCGCACTCAACGTCGGGGCAGGACTGGCAGGGGGAAG 
TGCGAGGGGGTTCTACGACAGAGGGCGTGAGTTGCAGCCCCGTCCTGACCGTCCCCCTTC 

ProGlylleTyrArgPheValAlaProGlyGluArgProSerGlyMetPheAspSerSer 
782 CCAGGCATCTACAGATTTGTGGCACCGGGGGAGCGCCCCTCCGGCATGTTCGACTCGTCC 
GGTCCGTAGATGTCTAAACACCGTGGCCCCCTCGCGGGGAGGCCGTACAAGCTGAGCAGG 

a a 

816 BGLI , 833 DRD1, 

ValLeuCysGluCysTyrAspAlaGlyCysAlaTrpTyrGluLeuThrProAIaGluThr 
8 4 2 GTCCTCTGTGAGTGCTATGACGCAGGCTGTGCTTGGTATGAGCTCACGCCCGCCGAGAC7 
CAGGAGACACTCACGATACTGCGTCCGACACGAACCATACTCGAGTGCGGGCGGCTCTGA 

A 

881 SAC I, 

ThrValArgLeuArgAlaTyrMetAsnThrProGlyLeuProValCysGlnAspHisLeu 
902 ACAGTTAGGCTACGAGCGTACATGAACACCCCGGGGCTTCCCGTGTGCCAGGACCATCTT 
TGTCAATCCGATGCTCGCATGTACTTGTGGGGCCCCGAAGGGCACACGGTCCTGGTAGAA 

931 SMAI XMAI, 

GluPheTrpGluGlyValPheThrGlyLeuThrHisIleAspAlaHisPheLeuSerGIn 
962 GAATTTTGGGAGGGCGTCTTTACAGGCCTCACTCATATAGATGCCCACTTTCTATCCCAG 
CTTAAAACCCTCCCGCAGAAATGTCCGGAGTGAGTATATCTACGGGTGAAAGATAGGGTC 

a 

985 STUI, 

ThrLysGlnSerGlyGluAsnLeuProTyrLeuValAlaTyrGlnAlaThrValCysAla 
102 2 ACAAAGCAGAGTGGGGAGAACCTTCCTTACCTGGTAGCGTACCAAGCCACCGTGTGCGCT 
TGTTTCGTCTCACCCCTCTTGGAAGGAATGGACCATCGCATGGTTCGGTGGCACACGCGA 

a 

1069 DRA3, 

ArgAlaGlnAlaProProProSerTrpAspGlnMetTrpLysCysLeuIleArgLeuLys 
1082 AGGGCTCAAGCCCCTCCCCCATCGTGGGACCAGATGTGGAAGTGTTTGATTCGCCTCAAG 
TCCCGAGTTCGGGGAGGGGGTAGCACCCTGGTCTACACCTTCACAAACTAAGCGGAGTTC 

ProThrLeuHisGlyProThrProLeuLeuTyrArgLeuGlyAlaValGlnAsnGluIle 
114 2 CCCACCCTCCATGGGCCAACACCCCTGCTATACAGACTGGGCGCTGTTCAGAATGAAATC 
GGGTGGGAGGTACCCGGTTGTGGGGACGATATGTCTGACCCGCGACAAGTCTTACTTTAG 

1150 NCOI, 

ThrLeuThrHisProValThrLysTyrlleMetThrCysMetSerAlaAspLeuGluVal 
12 02 AC C C T GAG G C AC C C AG T C ACC AAAT AC AT CAT G AC AT G CAT G T C G G C C G AC C T G GAG G T C 
TGGGACTGCGTGGGTCAGTGGTTTATGTAGTACTGTACGTACAGCCGGCTGGACCTCCAG 

A A A A A 

1230 BSPH1, 1234 DRD1, 1237 AVA3, 1245 EAG1 XMA3, 1250 DRD1, 



ValThrSerThrTrpValLeuValGlyGlyValLeuAlaAlaLeuAlaAlaTyrCysLeu 
1262 GTCACGAGCACCTGGGTGCTCGTTGGCGGCGTCCTGGCTGCTTTGGCCGCGTATTGCCTG 
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CAGTGCTCGTGGACCCACGAGCAACCGCCGCAGGACCGACGAAACCGGCGCATAACGGAC 

SerThrGIyCysValValllfeValGlyArgValValLeuSerGlyLysProAlallelle 
1322 TCAACAGGCTGCGTGGTCATAGTGGGCAGGGTCGTCTTGTCCGGGAAGCCGGCAATCATA 
AGTTGTCCGACGCACCAGTATCACCCGTCCCAGCAGAACAGGCCCTTCGGCCGTTAGTAT 

A 

1369 NAEI, 

ProAspArgGluValLeuTyrArgGluPheAspGluMetGluGluCysSerGlnHisLeu 
1382 CCTGACAGGGAAGTCCTCTACCGAGAGTTCGATGAGATGGAAGAGTGCTCTCAGCACTT A 
GGACTGTCCCTTCAGGAGATGGCTCTCAAGCTACTCTACCTTCTCACGAGAGTCGTGAAT 

1385 DRD1 , 

ProTyrlleGluGlnGlyMetMetLeuAlaGluGlnPheLysGlnLysAlaLeuGlyLeu 
14 42 CCGTACATCGAGCAAGGGATGATGCTCGCCGAGCAGTTCAAGCAGAAGGCCCTCGGCCTC 
GGCATGTAGCTCGTTCCCTACTACGAGCGGCTCGTCAAGTTCGTCTTCCGGGAGCCGGAG 

LeuGlnThrAlaSerArgGlnAlaGluVallleAlaProAlaVaiGlnThrAsnTrpGln 
1502 CTGCAGACCGCGTCCCGTCAGGCAGAGGTT ATCGCCCCTGCTGTCCAG ACCAACTGGCAA 
GACGTCTGGCGCAGGGCAGTCCGTCTCCAATAGCGGGGACGACAGGTCTGGTTGACCGTT 

1502 PSTI, 1507 TTH3I , 

LysLeuGluThrPheTrpAlaLysHisMetTrpAsnPhelleSerGlylleGlnTyrLeu 
1562 AAACTCGAGACCTTCTGGGCGAAGCATATGTGGAACTTCATCAGTGGGATACAATACTTG 
TTTGAGCTCTGGAAGACCCGCTTCGTATACACCTTGAAGTAGTCACCCTATGTTATGAAC 
a * 

1565 XHOI, 1586 NDEI, 

AlaGiyLeuSerThrLeuProGlyAsnProAlalleAlaSerLeuMetAlaPheThrAla 
1622 GCGGGCTTGTCAACGCTGCCTGGTAACCCCGCCATTGCTTCATTGATGGCTTTTACAGCT 
CGCCCGAACAGTTGCGACGGACCATTGGGGCGGTAACGAAGTAACTACCGAAAATGTCGA 

A A 

164 3 BSTE2 , 1677 ALWN1 PVU2 , 

AlaValThrSerProLeuThrThrSerGlnThrLeuLeuPheAsnlleLeuGlyGlyTrp 
1682 GCTGTCACCAGCCCACTAACCACTAGCCAAACCCTCCTCTTCAACATATTGGGGGGGTGG 
CGACAGTGGTCGGGTGATTGGTGATCGGTTTGGGAGGAGAAGTTGTATAACCCCCCCACC 

ValAlaAlaGlnLeuAlaAlaPrcGlyAlaAlaThrAlaPheValGlyAlaGlyLeuAla 

17 4 2 GTGGCTGCCCAGCTCGCCGCCCCCGGTGCCGCTACTGCCTTTGTGGGCGCTGGCTTAGCT 

CACCGACGGGTCGAGCGGCGGGGGCCACGGCGATGACGGAAACACCCGCGACCGAATCGA 

a 

1794 ESP1, 

GlyAlaAlalleGlySerValGlyLeuGlyLysValLeuIleAspIleLeuAlaGlyTyr 
1302 GGCGCCGCCATCGGCAGTGTTGGACTGGGGAAGGTCCTCATAGACATCCTTGCAGGGTAT 
CCGCGGCGGTAGCCGTCACAACCTGACCCCTTCCAGGAGTATCTGTAGGAACGTCCCATA 

A 

1802 KAS1 NARI, 

GlyAlaGlyValAlaGlyAlaLeuValAlaPheLysIleMetSerGlyGluValProSer 

18 62 GGCGCGGGCGTGGCGGGAGCTCTTGTGGCATTCAAGATCATGAGCGGTGAGGTCCCCTCC 

CCGCGCCCGCACCGCCCTCGAGAACACCGTAAGTTCTAGTACTCGCCACTCCAGGGGAGG 

A A 

1878 SAC I, 1899 BSPH1, 
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ThrGluAspLeuValAsnLeuLeuProAlalleLeuSerProGlyAlaLeuValValGiy 
1922 ACGGAGGACCTGGTCAATCTACTGCCCGCCATCCTCTCGCCCGGAGCCCTCGTAGTCGGC 
TGCCTCCTGGACCAGTTAGATGACGGGCGGTAGGAGAGCGGGCCTCGGGAGCATCAGCCG 

, A 

1928 TTH3I f 

ValValCysAlaAlalleLeuArgArgHisValGlyProGlyGluGlyAlaValGlnTrp 
1982 GTGGTCTGTGCAGCAATACTGCGCCGGCACGTTGGCCCGGGCGAGGGGGCAGTGCAGTGG 
CACCAGACACGTCGTTATGACGCGGCCGTGCAACCGGGCCCGCTCCCCCGTCACGTCACC 

A A 

2004 NAEI, 2017 SMAI XMAI, 

MetAsnArgLeuIleAlaPheAlaSerArgGlyAsnHisValSerProThrHisTyrVal 
204 2 ATGAACCGGCTGATAGCCTTCGCCTCCCGGGGGAACCATGTTTCCCCCACGCACTACGTG 
TACTTGGCCGACTATCGGAAGCGGAGGGCCCCCTTGGTACAAAGGGGGTGCGTGATGCAC 

A ^ 

2067 SMAI XMAI, 2093 DRA3, 

PrcGluSerAspAlaAlaAlaArgValThrAlalleLeuSerSerLeuThrValThrGln 
2102 CCGGAGAGCGATGCAGCTGCCCGCGTCACTGCCATACTCAGCAGCCTCACTGTAACCCAG 
GGCCTCTCGCTACGTCGACGGGCGCAGTGACGGTATGAGTCGTCGGAGTGACATTGGGTC 

A ^ 

2115 PVU2, 2159 ALWN1 , 

LeuLeuArgArgLeuHisGlnTrpIleSerSerGluCysThrThrProCysSerGlySer 
2162 CTCCTGAGGCGACTGCACCAGTGGATAAGCTCGGAGTGTACCACTCCATGCTCCGGTTCC 
GAGGACTCCGCTGACGTGGTCACCTATTCGAGCCTCACATGGTGAGGTACGAGGCCAAGG 

A A 

2164 MST2, 2220 ECON1, 

TrpLeuArgAspIleTrpAspTrpIleCysGluValLeuSerAspPheLysThrTrpLeu 
2222 TGGCTAAGGGACATCTGGGACTGGATATGCGAGGTGTTGAGCGACTTTAAGACCTGGCTA 
ACCGATTCCCTGTAGACCCTGACCTATACGCTCCACAACTCGCTGAAATTCTGGACCGAT 

LysAlaLysLeuMetProGlnLeuProGlylleProPheValSerCysGlnArgGlyTyr 
2282 AAAGCTAAGCTCATGCCACAGCTGCCTGGGATCCCCTTTGTGTCCTGCCAGCGCGGGTAT 
TTTCGATTCGAGTACGGTGTCGACGGACCCTAGGGGAAACACAGGACGGTCGCGCCCATA 

y\ a a 

2285 ESP1, 2300 PVU2, 2310 BAMHI, 

LysGlyValTrpArgGlyAspGlylleMetHisThrArgCysHisCysGlyAlaGluIle 
234 2 AAGGGGGTCTGGCGAGGGGACGGCATCATGCACACTCGCTGCCACTGTGGAGCTGAGATC 
TTCCCCCAGACCGCTCCCCTGCCGTAGTACGTGTGAGCGACGGTGACACCTCGACTCTAG 

ThrGlyHisValLysAsnGlyThrMetArglleValGlyProArgThrCysArgAsnMet 
24 02 ACTGGACATGTCAAAAACGGGACGATGAGGATCGTCGGTCCTAGGACCTGCAGGAACATG 
TGACCTGTACAGTTTTTGCCCTGCTACTCCTAGCAGCCAGGATCCTGGACGTCCTTGTAC 

A AAA 

2425 BSAB1, 2441 AVR2, 2448 SSE83871, 2449 PSTI, 

TrpSerGlyThrPheProIleAsnAlaTyrThrThrGlyProCysThrProLeuProAla 
24 62 TGGAGTGGGACCTTCCCCATTAATGCCTACACCACGGGCCCCTGTACCCCCCTTCCTGCG 
ACCTCACCCTGGAAGGGGTAATTACGGATGTGGTGCCCGGGGACATGGGGGGAAGGACGC 



2480 ASE1, 2497 APAI, 
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ProAsnTyrThrPheAlaLeuTrpArgValSerAlaGluGluTyrValGluIIeArgGin 
2522 CCGAACTACACGTTCGCGCTATGGAGGGTGTCTGCAGAGGAATACGTGGAGATAAGGCAG 
GGCTTGATGTGCAAGCGCGATACCTCCCACAGACGTCTCCTTATGCACCTCTATTCCGTC 

l A 

'2553 PSTI, 

ValGlyAspPheHisTyrValThrGlyMetThrThrAspAsnLeuLysCysProCysGln 
2582 GTGGGGGACTTCCACTACGTGACGGGTATGACTACTGACAATCTTAAATGCCCGTGCCAG 
CACCCCCTGAAGGTGATGCACTGCCCATACTGATGACTGTTAGAATTTACGGGCACGGTC 

A 

2594 DRA3, 

ValProSerProGluPhePheThrGluLeuAspGlyValArgLeuHisArgPheAlaPro 
2642 GTCGCATCGCCCGAATTTTTCACAGAATTGGACGGGGTGCGCCTACATAGGTTTGCGCCC 
CAGGGTAGCGGGCTTAAAAAGTGTCTTAACCTGCCCCACGCGGATGTATCCAAACGCGGG 

ProCysLysProLeuLeuArgGluGluValSerPheArgValGlyLeuHisGluTyrPro 
27 02 CCCTGCAAGCCCTTGCTGCGGGAGGAGGTATCATTCAGAGTAGGACTCCACGAATACCCG 
GGGACGTTCGGGAACGACGCCCTCCTCCATAGTAAGTCTCATCCTGAGGTGCTTATGGGC 

A 

2757 HGIE2, 

ValGlySerGlnLeuProCysGluProGluProAspValAlaValLeuThrSerMetLeu 
27 62 GTAGGGTCGCAATTACCTTGCGAGCCCGAACCGGACGTGGCCGTGTTGACGTCCATGCTC 
CATCCCAGCGTTAATGGAACGCTCGGGCTTGGCCTGCACCGGCACAACTGCAGGTACGAG 

A 

2809 AAT2 , 

ThrAspProSerHisIleThrAlaGluAlaAlaGlyArgArgLeuAlaArgGlySerPro 
2822 ACTGATCCCTCCCATATAACAGCAGAGGCGGCCGGGCGAAGGTTGGCGAGGGGATCACCC 
TGACTAGGGAGGGTATATTGTCGTCTCCGCCGGCCCGCTTCCAACCGCTCCCCTAGTGGG 

A 

2850 EAG1 XMA3, 

ProSerValAlaSerSerSerAlaSerGlnLeuSerAlaProSerLeuLysAlaThrCys 
2882 CCCTCTGTGGCCAGCTCCTCGGCTAGCCAGCTATCCGCTCCATCTCTCAAGGCAACTTGC 
GGGAGACACCGGTCGAGGAGCCGATCGGTCGATAGGCGAGGTAGAGAGTTCCGTTGAACG 

A A 

2889 BALI , 2903 NHEI, 

ThrAlaAsnHisAspSerProAspAlaGluLeuIleGluAlaAsnLeuLeuTrpArgGln 
294 2 ACCGCTAACCATGACTCCCCTGATGCTGAGCTCATAGAGGCCAACCTCCTATGGAGGCAG 
TGGCGATTGGTACTGAGGGGACTACGACTCGAGTATCTCCGGTTGGAGGATACCTCCGTC 

A A 

2966 ESP1, 2969 SACI, 

GluMetGlyGlyAsnlleThrArgValGluSerGluAsnLysValVallleLeuAspSer 
3002 GAGATGGGCGGCAACATCACCAGGGTTGAGTCAGAAAACAAAGTGGTGATTCTGGACTCC 
CTCTACCCGCCGTTGTAGTGGTCCCAACTCAGTCTTTTGTTTCACCACTAAGACCTGAGG 

PheAspProLeuValAlaGluGluAspGluArgGluIleSerValProAlaGluIleLeu 
3062 TTCGATCCGCTTGTGGCGGAGGAGGACGAGCGGGAGATCTCCGTACCCGCAGAAATCCTG 
AAGCTAGGCGAACACCGCCTCCTCCTGCTCGCCCTCTAGAGGCATGGGCGTCTTTAGGAC 

A 

3096 BGL2 , 



ArgLysSerArgArgPheAlaGlnAIaLeuProValTrpAlaArgProAspTyrAsnPro 
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3122 CGGAAGTCTCGGAGATTCGCCCAGGCCCTGCCCGTTTGGGCGCGGCCGGACTATAACCCC 
GCCTTCAGAGCCTCTAAGCGGGTCCGGGACGGGCAAACCCGCGCCGGCCTGATATTGGGG 

A A 

3143 ALWN1, 3164 EAG1 XMA3, 

ProLeuValGluThrTrpLysLysProAspTyrGluProProValValHisGlyCysPro 
3182 CCGCTAGTGGAGACGTGGAAAAAGCCCGACTACGAACCACCTGTGGTCCATGGCTGCCCG 
GGCGATCACCTCTGCACCTTTTTCGGGCTGATGCTTGGTGGACACCAGGTACCGACGGGC 

A A 

3217 HGIE2 , 3229 NCOI, 

LeuProProProLysSerProProValProProProArgLysLysArgThrValValLeu 
324 2 CTTCCACCTCCAAAGTCCCCTCCTGTGCCTCCGCCTCGGAAGAAGCGGACGGTGGTCCTC 
GAAGGTGGAGGTTTCAGGGGAGGACACGGAGGCGGAGCCTTCTTCGCCTGCCACCAGGAG 

ThrGluSerThrLeuSerThrAlaLeuAlaGluLeuAlaThrArgSerPheGlySerSer 
3302 ACTGAATCAACCCTATCTACTGCCTTGGCCGAGCTCGCCACCAGAAGCTTTGGCAGCTCC 
TGACTTAGTTGGGATAGATGACGGAACCGGCTCGAGCGGTGGTCTTCGAAACCGTCGAGG 

A A 

3332 SAC I, 3346 HIND3, 

SerThrSerGlylleThrGlyAspAsnThrThrThrSerSerGluProAlaProSerGly 
3362 TCAACTTCCGGCATTACGGGCGACAATACGACAACATCCTCTGAGCCCGCCCCTTCTGGC 
AGTTGAAGGCCGTAATGCCCGCTGTTATGCTGTTGTAGGAGACTCGGGCGGGGAAGACCG 

CysProProAspSerAspAlaGluSerTyrSerSerMetProProLeuGluGlyGluPro 
3422 TGCCCCCCCGACTCCGACGCTGAGTCCTATTCCTCCATGCCCCCCCTGGAGGGGGAGCCT 
ACGGGGGGGCTGAGGCTGCGACTCAGGATAAGGAGGTACGGGGGGGACCTCCCCCTCGGA 

A 

3437 EAM11051, 

GlyAspProAspLeuSerAspGlySerTrpSerThrValSerSerGluAlaAsnAlaGlu 
34 82 GGGGATCCGGATCTTAGCGACGGGTCATGGTCAACGGTCAGTAGTGAGGCCAACGCGGAG 
CCCCTAGGCCTAGAATCGCTGCCCAGTACCAGTTGCCAGTCATCACTCCGGTTGCGCCTC 

A A /\ 

3484 BAMHI, 3485 BSAB1, 3487 BSPE1, 

AspValValCysCysSerN3etSerTyrSerTrpThrGlyAlaLeuValThrProCysAla 
3542 GATGTCGTGTGCTGCTCAATGTCTTACTCTTGGACAGGCGCACTCGTCACCCCGTGCGCC 
CTACAGCACACGACGAGTTACAGAATGAGAACCTGTCCGCGTGAGCAGTGGGGCACGCGG 

A A 

3589 DRA3, 3600 SAC2, 

AlaGluGluGlnLysLeuProIleAsnAlaLeuSerAsnSerLeuLeuArgHisHisAsn 
3602 GCGGAAGAACAGAAACTGCCCATCAATGCACTAAGCAACTCGTTGCTACGTCACCACAAT 
CGCCTTCTTGTCTTTGACGGGTAGTTACGTGATTCGTTGAGCAACGATGCAGTGGTGTTA 

A /V 

3611 ALWN1 , 3655 PFLM1, 

LeuValTyrSerThrThrSerArgSerAlaCysGlnArgGInLysLysValThrPheAsp 
3662 TTGGTGTATTCCACCACCTCACGCAGTGCTTGCCAAAGGCAGAAGAAAGTCACATTTGAC 
AACCACATAAGGTGGTGGAGTGCGTCACGAACGGTTTCCGTCTTCTTTCAGTGTAAACTG 

A 

3681 DRA3, 

ArgLeuGlnValLeuAspSerHisTyrGlnAspValLeuLysGluValLysAlaAlaAla 
3722 AGACTGCAAGTTCTGGACAGCCATTACCAGGACGTACTCAAGGAGGTTAAAGCAGCGGCG 
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TCTGACGTTCAAGACCTGTCGGTAATGGTCCTGCATGAGTTCCTCCAATTTCGTCGCCGC 

SerLysValLysAlaAsnLeuLeuSerValGluGluAlaCysSerLeuThrProProHis 
3782 TCAAAAGTGAAGGCTAACTTGCTATCCGTAGAGGAAGCTTGCAGCCTGACGCCCCCACAC 
t AGTTTTCACTTCCGATTGAAC GATAGGCATCTCCTTCGAACGTCGGACTGCGGGGGTGTG 

A 

3816 HIND3, 

SerAlaLysSerLysPheGlyTyrGlyAlaLysAspValArgCysHisAlaArgLysAIa 
384 2 TCAGCCAAATCCAAGTTTGGTTATGGGGCAAAAGACGTCCGTTGCCATGCCAGAAAGGCC 
AGTCGGTTTAGGTTCAAACCAATACCCCGTTTTCTGCAGGCAACGGTACGGTCTTTCCGG 

A A 

3875 AAT2, 3890 BGLI, 

ValThrHisIleAsnSerValTrpLysAspLeuLeuGluAspAsnValThrProIleAsp 
3902 GTAACCCACATCAACTCCGTGTGGAAAGACCTTCTGGAAGACAATGTMCACCAATAGAC 
CATTGGGTGTAGTTGAGGCACACCTTTCTGGAAGACCTTCTGTTACATTGTGGTTATCTG 

ThrThrlleMetAlaLysAsnGluValPheCysValGlnProGluLysGlyGlyArgLys 
3962 ACTACCATCATGGCTAAGAACGAGGTTTTCTGCGTTCAGCCTGAGAAGGGGGGTCGTAAG 
TGATGGTAGTACCGATTCTTGCTCCAAAAGACGCAAGTCGGACTCTTCCCCCCAGCATTC 

ProAlaArgLeuIleValPheProAspLeuGlyValArgValCysGluLysMetAlaLeu 
4 022 CCAGCTCGTCTCATCGTGTTCCCCGATCTGGGCGTGCGCGTGTGCGAAAAGATGGCTTTG 
GGTCGAGCAGAGTAGCACAAGGGGCTAGACCCGCACGCGCACACGCTTTTCTACCGAAAC 

TyrAspValValThrLysLeuProLeuAlaValMetGlySerSerTyrGlyPheGlnTyr 
4 082 TACGACGTGGTTACAAAGCTCCCCTTGGCCGTGATGGGAAGCTCCTACGGATTCCAATAC 
ATGCTGCACCAATGTTTCGAGGGGAACCGGCACTACCCTTCGAGGATGCCTAAGGTTATG 

SerProGlyGInArgValGluPheLeuValGlnAlaTrpLysSerLysLysThrProMet 
414 2 TCACCAGGACAGCGGGTTGAATTCCTCGTGCAAGCGTGGAAGTCCAAGAAAACCCCAATG 
AGTGGTCCTGTCGCCCAACTTAAGGAGCACGTTCGCACCTTCAGGTTCTTTTGGGGTTAC 

A 

4160 ECORI, 

GlyPheSerTyrAspThrArgCysPheAspSerThrValThrGluSerAspIleArgThr 
4202 GGGTTCTCGTATGATACCCGCTGCTTTGACTCCACAGTCACTGAGAGCGACATCCGTACG 
CCCAAGAGCATACTATGGGCGACGAAACTGAGGTGTCAGTGACTCTCGCTGTAGGCATGC 

A A 

4229 DRD1, 4236 ALWN1, 

GluGluAlalleTyrGlnCysCysAspLeuAspProGlnAlaArgValAlalleLysSer 
4 2 62 GAGGAGGCAATCTACCAATGTTGTGACCTCGACCCCCAAGCCCGCGTGGCCATCAAGTCC 
CTCCTCCGTTAGATGGTTACAACACTGGAGCTGGGGGTTCGGGCGCACCGGTAGTTCAGG 

A A 

4301 BGLI , 4308 BALI, 

LeuThrGluArgLeuTyrValGlyGlyProLeuThrAsnSerArgGlyGluAsnCysGly 
4 322 CTCACCGAGAGGCTTTATGTTGGGGGCCCTCTTACCAATTCAAGGGGGGAGAACTGCGGC 
GAGTGGCTCTCCGAAATACAACCCCCGGGAGAATGGTTAAGTTCCCCCCTCTTGACGCCG 

A 

4 34 5 APAI, 

TyrArgArgCysArgAlaSerGlyValLeuThrThrSerCysGlyAsnThrLeuThrCys 
4 382 TATCGCAGGTGCCGCGCGAGCGGCGTACTGACAACTAGCTGTGGTAACACCCTCACTTGC 
ATAGCGTCCACGGCGCGCTCGCCGCATGACTGTTGATCGACACCATTGTGGGAGTGAACG 
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TyrlleLysAlaArgAlaAlaCysArgAlaAlaGlyLeuGlnAspCysThrMetLeuVal 
4 4 42 TACATCAAGGCCCGGGCAGCCTGTCGAGCCGCAGGGCTCCAGGACTGCACCATGCTCGTG 
ATGTAGTTCCGGGCCCGTCGGACAGCTCGGCGTCCCGAGGTCCTGACGTGGTACGAGCAC 

A 

4 452 SMAI XMAI, 

CysGlyAspAspLeuValVallleCysGluSerAlaGlyValGlnGluAspAlaAlaSer 
4502 TGTGGCGACGACTTAGTCGTTATCTGTGAAAGCGCGGGGGTCCAGGAGGACGCGGCGAGC 
ACACCGCTGCTGAATCAGCAATAGACACTTTCGCGCCCCCAGGTCCTCCTGCGCCGCTCG 

4508 DRD1 , 4511 TTH3I , 

LeuArgAlaPheThrGluAlaMetThrArgTyrSerAlaProProGlyAspProProGln 
4 5 62 CTGAGAGCCTTCACGGAGGCTATGACCAGGTACTCCGCCCCCCCTGGGGACCCCCCACAA 
GACTCTCGGAAGTGCCTCCGATACTGGTCCATGAGGCGGGGGGGACCCCTGGGGGGTGTT 

ProGluTyrAspLeuGluLeuIleThrSerCysSerSerAsnValSerValAlaHisAsp 
4 622 CCAGAATACGACTTGGAGCTCATAACATCATGCTCCTCCAACGTGTCAGTCGCCCACGAC 
GGTCTTATGCTGAACCTCGAGTATTGTAGTACGAGGAGGTTGCACAGTCAGCGGGTGCTG 

4 637 SAC I, 

GlyAlaGlyLysArgVaiTyrTyrLeuThrArgAspProThrThrProLeuAlaArgAla 
4 682~ GGCGCTGGAAAGAGGGTCTACTACCTCACCCGTGACCCTACAACCCCCCTCGCGAGAGCT 
CCGCGACCTTTCTCCCAGATGATGGAGTGGGCACTGGGATGTTGGGGGGAGCGCTCTCGA 

4731 NRUI, 

AlaTrpGluThrAlaArgHisThrProValAsnSerTrpLeuGlyAsnllelleMetPhe 
4 7 4 2 GCGTGGGAGACAGCAAGACACACTCCAGTCAATTCCTGGCTAGGCAACATAATCATGTTT 
CGCACCCTCTGTCGTTCTGTGTGAGGTCAGTTAAGGACCGATCCGTTGTATTAGTACAAA 

AlaProThrLeuTrpAlaArgMetlleLeuMetThrHisPhePheSerValLeuIleAla 
4 802 GCCCCCACACTGTGGGCGAGGATGATACTGATGACCCATTTCTTTAGCGTCCTTATAGCC 
CGGGGGTGTGACACCCGCTCCTACTATGACTACTGGGTAAAGAAATCGCAGGAATATCGG 

A A 

4806 PFLM1, 4807 DRA3, 

ArgAspGlnLeuGluGlnAlaLeuAspCysGluIleTyrGlyAlaCysTyrSerlleGlu 
4 862 AGGGACCAGCTTGAACAGGCCCTCGATTGCGAGATCTACGGGGCCTGCTACTCCATAGAA 
TCCCTGGTCGAACTTGTCCGGGAGCTAACGCTCTAGATGCCCCGGACGATGAGGTATCTT 

4893 BGL2, 

ProLeuAspLeuProProIlelleGlnArgLeuHisGlyLeuSerAlaPheSerLeuHis 
4 922 C C ACTGGAT CT ACCTCC AAT CATTC AAAG AC T CC AT GGCC T CAGCGC AT TTTC ACT CC AC 
GGTGACCTAGATGGAGGTTAGTAAGTTTCTGAGGTACCGGAGTCGCGTAAAAGTGAGGTG 

4 954 NCOI, 

SerTyrSerProGlyGluIleAsnArgValAlaAlaCysLeuArgLysLeuGlyValPro 
4 982 AGTTACTCTCCAGGTGAAATCAATAGGGTGGCCGCATGCCTCAGAAAACTTGGGGTACCG 
TCAATGAGAGGTCCACTTTAGTTATCCCACCGGCGTACGGAGTCTTTTGAACCCCATGGC 



5015 SPHI, 5035 KPNI, 
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ProLeuArgAlaTrpArgHisArgAlaArgSerValArgAlaArgLeuLeuAlaArgGiy 
504 2 CCCTTGCGAGCTTGGAGACACCGGGCCCGGAGCGTCCGCGCTAGGCTTCTGGCCAGAGGA 
GGGAACGCTCGAACCTCTGTGGCCCGGGCCTCGCAGGCGCGATCCGAAGACCGGTCTCCT 

A A 

, 5064 APA-I, 5091 BALI, 

GlyArgAlaAlalleCysGlyLysTyrLeuPheAsnTrpAlaValArgThrLysLeuLys 
5102 GGCAGGGCTGCCATATGTGGCAAGTACCTCTTCAACTGGGCAGTAAGAACAAAGCTCAAA 
CCGTCCCGACGGTATACACCGTTCATGGAGAAGTTGACCCGTCATTCTTGTTTCGAGTTT 

a 

5113 NDEI , 

LeuThrProIleAlaAlaAlaGlyGlnLeuAspLeuSerGlyTrpPheThrAlaGlyTyr 
5162 CTCACTCCAATAGCGGCCGCTGGCCAGCTGGACTTGTCCGGCTGG.TTCACGGCTGGCTAC 
GAGTGAGGTTATCGCCGGCGACCGGTCGACCTGAACAGGCCGACCAAGTGCCGACCGATG 

S\ A A 

5174 NOTI, 5175 EAG1 XMA3, 5182 BALI, 5186 PVU2, 

SerGlyGlyAspIleTyrHisSerValSerHisAlaArgProArgTrpIleTrpPheCys 
5222 AGCGGGGGAGACATTTATCACAGCGTGTCTCATGCCCGGCCCCGCTGGATCTGGTTTTGC 
TCGCCCCCTCTGTAAATAGTGTCGCACAGAGTACGGGCCGGGGCGACCTAGACCAAAACG 

A 

5240 DRA3, 

LeuLeuLeuLeuAlaAlaGlyValGlylleTyrLeuLeuProAsnArgMetSerThrAsn 
5282 CTACTCCTGCTTGCTGCAGGGGTAGGCATCTACCTCCTCCCCAACCGAATGAGCACGAAT 
GATGAGGACGAACGACGTCCCCATCCGTAGATGGAGGAGGGGTTGGCTTACTCGTGCTTA 

A 

5295 PSTI, 

ProLysProGlnArgLysThrLysArgAsnThrAsnArgArgProGlnAspValLysPhe 
534 2 CCTAAACCTCAAAGAAAGACCAAACGTAACACCAACCGGCGGCCGCAGGACGTCAAGTTC 
GGATTTGGAGTTTCTTTCTGGTTTGCATTGTGGTTGGCCGCCGGCGTCCTGCAGTTCAAG 

A A A A 

5380 NOTI , 5381 EAG1 XMA3, 5390 AAT2 , 5401 SMAI XMAI, 

ProGlyGlyGIyGlnlleValGlyGlyValTyrLeuLeuProArgArgGlyProArgLeu 
54 02 CCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGGCCCTAGATTG 
GGCCCACCGCCAGTCTAGCAACCACCTCAAATGAACAACGGCGCGTCCCCGGGATCTAAC 

A 

54 4 9 APAI, 

GlyValArgAlaThrArgLysThrSerGluArgSerGlnProArgGlyArgArgGlnPro 
54 62 GGTGTGCGCGCGACGAGAAAGACTTCCGAGCGGTCGCAACCTCGAGGTAGACGTCAGCCT 
CCACACGCGCGCTGCTCTTTCTGAAGGCTCGCCAGCGTTGGAGCTCCATCTGCAGTCGGA 

A A A A 

5467 BSSH2, 5478 XMNI, 5502 XHOI, 5511 AAT2, 

IleProLysAlaArgArgProGluGlyArgThrTrpAlaGlnProGlyTyrProTrpPro 
5522 ATCCCCAAGGCTCGTCGGCCCGAGGGCAGGACCTGGGCrCAGCCCGGGTACCCHIGGCCC 
TAGGGGTTCCGAGCAGCCGGGCTCCCGTCCTGGACCCGAGTCGGGCCCATGGGAACCGGG 

A AAA 

5548 ALWN1, 5558 ESP1, 5564 SMAI XMAI , 5568 KPNI, 

LeuTyrGlyAsnGluGlyCysGlyTrpAlaGlyTrpLeuLeuSerProArgGlySerArg 
5582 CTCTATGGCAATGAGGGCTGCGGGTGGGCGGGATGGCTCCTGTCTCCCCGTGGCTCTCGG 
GAGATACCGTTACTCCCGACGCCCACCCGCCCTACCGAGGACAGAGGGGCACCGAGAGCC 
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ProSerTrpGlyProThrAspProArgArgArgSerArgAsnLeuGlyLysVallieAsp 
5642 CCTAGCTGGGGCCCCACAGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGGTCATCGAT 
GGATCGACCCCGGGGTGTCTGGGGGCCGCATCCAGCGCGTTAAACCCATTCCAGTAGCTA 

t A A 

5650 APAI, 5696 CLAI, 

ThrLeuThrCysGlyPheAlaAspLeuMetGlyTyrlleProLeuValGlyAlaProLeu 
5702 ACCCTTACGTGCGGCTTCGCCGACCTCATGGGGTACATACCGCTCGTCGGCGCCCCTCTT 
TGGGAATGCACGCCGAAGCGGCTGGAGTACCCCATGTATGGCGAGCAGCCGCGGGGAGAA 

A A A 

5724 HGIE2, 5750 KAS1 NARI , 5756 ECON1, 

GlyGlyAlaAlaArgAlaLeuAlaHisGlyValArgValLeuGluAspGlyValAsnTyr 
57 62 GGAGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAAGACGGCGTGAACTAT 
CCTCCGCGACGGTCCCGGGACCGCGTACCGCAGGCCCAAGACCTTCTGCCGCACTTGATA 

A A 

5772 BSTXI, 5775 APAI, 

AlaThrGlyAsnLeuProGlyCysSerOC AM 
5822 GCAACAGGGAACCTTCCTGGTTGCTCTTAATAGTCGAC 
CGTTGTCCCTTGGAAGGACCAACGAGAATTATCAGCTG 

A 

5854 SALI, 



• FIGURE 19 • 
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MetAlaAlaTyrAlaAIaGlnGlyTyrLysVaiLeuValLeuAsn 
2 AGCTTACAAAACAAAATGGCTGCATATGCAGCTCAGGGCTATAAGGTGCTAGTACTCAAC 
TCGAATGTTTTGTTTTACCGACGTATACGTCGAGTCCCGATATTCCACGATCATGAGTTG 

A A A 

1 HIND3 , 24 NDEI, 52 SCAI, 

ProSerValAlaAlaThrLeuGlyPheGlyAlaTyrMetSerLysAiaHisGlylleAsp 
62 CCCTCTGTTGCTGCAACACTGGGCTTTGGTGCTTACATGTCCAAGGCTCATGGGATCGAT 
GGGAGACAACGACGTTGTGACCCGAAACCACGAATGTACAGGTTCCGAGTACCCTAGCTA 

A 

116 CLAI , 

ProAsnlleArgThrGlyValArgThrlleThrThrGlySerProIleThrTyrSerThr 
122 CCTAACATCAGGACCGGGGTGAGAACAATTACCACTGGCAGCCCCATCACGTACTCCACC 
GGATTGTAGTCCTGGCCCCACTCTTGTTAATGGTGACCGTCGGGGTAGTGCATGAGGTGG 

TyrGlyLysPheLeuAlaAspGlyGlyCysSerGlyGlyAlaTyrAspIlellelleCys 
1S2 TACGGCAAGTTCCTTGCCGACGGCGGGTGCTCGGGGGGCGCTTATGACATAATAATTTGT 
ATGCCGTTCAAGGAACGGCTGCCGCCCACGAGCCCCCCGCGAATACTGTATTATTAAACA 

AspGluCysHisSerThrAspAlaThrSerlleLeuGlylieGlyThrValLeuAspGln 
2 4 2 GACGAGTGCCACTCCACGGATGCCACATCCATCTTGGGCATTGGCACTGTCCTTGACCAA 
CTGCTCACGGTGAGGTGCCTACGGTGTAGGTAGAACCCGTAACCGTGACAGGAACTGGTT 

AlaGluThrAlaGlyAlaArgLeuValValLeuAlaThrAlaThrProProGlySerVal 
302 GCAGAGACTGCGGGGGCGAGACTGGTTGTGCTCGCCACCGCCACCCCTCCGGGCTCCGTC 
CGTCTCTGACGCCCCCGCTCTGACCAACACGAGCGGTGGCGGTGGGGAGGCCCGAGGCAG 

303 ALWN1, 

ThrValProHisProAsnlleGluGluValAIaLeuSerThrThrGlyGluIleProPhe 
362 ACTGTGCCCCATCCCAACATCGAGGAGGTTGCTCTGTCCACCACCGGAGAGATCCCTTTT 
TGACACGGGGTAGGGTTGTAGCTCCTCCAACGAGACAGGTGGTGGCCTCTCTAGGGAAAA 

TyrGlyLysAlalleProLeuGluVallleLysGlyGlyArgHisLeuIlePheCysHis 
422 TACGGCAAGGCTATCCCCCTCGAAGTAATCAAGGGGGGGAGACATCTCATCTTCTGTCAT 
ATGCCGTTCCGATAGGGGGAGCTTCATTAGTTCCCCCCCTCTGTAGAGTAGAAGACAGTA 
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SerLysLysLysCysAspGluLeuAlaAlaLysLeuValAlaLeuGlylleAsnAlaVal 
4 82 TCAAAGAAGAAGTGCGACGAACTCGCC6CAAAGCTGGTCGCATTGGGCATCAATGCCGTG 
t AGTTTC TTCTTCACGCTGCTTGAGCGGCGTTTCGACCAGCGTAACCCGTAGTTACGGCAC 

AlaTyrTyrArgGlyLeuAspValSerVallleProThrSerGlyAspValValValVal 
54 2 GCCTACTACCGCGGTCTTGACGTGTCCGTCATCCCGACCAGCGGCGATGTTGTCGTCGTG 
CGGATGATGGCGCCAGAACTGCACAGGCAGTAGGGCTGGTCGCCGCTACAACAGCAGCAC 

A A 

550 SAC2, 560 DRD1, 

AlaThrAspAlaLeuMetThrGlyTyrThrGlyAspPheAspSerVallleAspCysAsn 
602 GCAACCGATGCCCTCATGACCGGCTATACCGGCGACTTCGACTCGGTGATAGACTGCAAT 
CGTTGGCTACGGGAGTACTGGCCGATATGGCCGCTGAAGCTGAGCCACTATCTGACGTTA 

615 BSPH1, 

ThrCysVaiThrGlnThrValAspPheSerLeuAspProThrPheThrlleGluThrlle 
662 ACGTGTGTCACCCAGACAGTCGATTTCAGCCTTGACCCTACCTTCACCATTGAGACAATC 
TGCACACAGTGGGTCTGTCAGCTAAAGTCGGAACTGGGATGGAAGTGGTAACTCTGTTAG 

ThrLeuProGInAspAlaValSerArgThrGlnArgArgGlyArgThrGlyArgGlyLys 
7 22 ACGCTCCCCCAAGATGCTGTCTCCCGCACTCAACGTCGGGGCAGGACTGGCAGGGGGAAG 
TGCGAGGGGGTTCTACGACAGAGGGCGTGAGTTGCAGCCCCGTCCTGACCGTCCCCCTTC 

ProGlylleTyrArgPheValAlaProGlyGluArgProSerGlyMetPheAspSerSer 
782 CCAGGCATCTACAGATTTGTGGCACCGGGGGAGCGCCCCTCCGGCAXGTTCGACTCGTCC 
GGTCCGTAGATGTCTAAACACCGTGGCCCCCTCGCGGGGAGGCCGTACAAGCTGAGCAGG 

A S\ 

816 BGLI, 833 DRD1, 

ValLeuCysGluCysTyrAspAlaGlyCysAlaTrpTyrGluLeuThrProAlaGluThr 
84 2 GTCCTCTGTGAGTGCTATGACGCAGGCTGTGCTTGGTATGAGCTCACGCCCGCCGAGACT 
CAGGAGACACTCACGATACTGCGTCCGACACGAACCATACTCGAfiTGCGGGCGGCTCTGA 

881 SACI, 

ThrValArgLeuArgAlaTyrMetAsnThrProGlyLeuProValCysGlnAspHisLeu 
902 ACAGTTAGGCTACGAGCGTACATGAACACCCCGGGGCTTCCCGTGTGCCAGGACCATCTT 
TGTCAATCCGATGCTCGCATGTACTTGTGGGGCCCCGAAGGGCACACGGTCCTGGTAGAA 

931 SMAI XMAI, 

GluPheTrpGluGlyValPheThrGlyLeuThrHisIleAspAlaHisPheLeuSerGln 
962 GAA7TTTGGGAGGGCGTCTTTACAGGCCTCACTCATATAGATGCCCACTTTCTATCCCAG 
CTTAAAACCCTCCCGCAGAAATGTCCGGAGTGAGTATATCTACGGGTGAAAGATAGGGTC 

A 

985 STUI, 

ThrLysGlnSerGlyGiuAsnLeuProTyrLeuValAlaTyrGlnAlaThrValCysAla 
1022 ACAAAGCAGAGTGGGGAGAACCTTCCTTACCTGGTAGCGTACCAAGCCACCGTGTGCGCT 
TGTTTCGTCTCACCCCTCTTGGAAGGAATGGACCATCGCATGGTTCGGTGGCACACGCGA 

1069 DRA3, 

ArgAlaGlnAlaProProProSerTrpAspGlnMetTrpLysCysLeuIleArgLeuLys 
1082 AGGGCTCAAGCCCCTCCCCCATCGTGGGACCAGATGTGGAAGTGTTTGATTCGCCTCAAG 
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TCCCGAGTTCGGGGAGGGGGTAGCACCCTGGTCTACACCTTCACAAACTAAGCGGAGTTC 

ProThrLeuHisGlyProThrProLeuLeuTyrArgLeuGlyAlaValGlnAsnGIuIle 
1142 CCCACCCTCCATGGGCCAACACCCCTGCTATACAGACTGGGCGCTGTTCAGAATGAAATC 
GGGTGGGAGGTACCCGGTTGTGGGGACGATATGTCTGACCCGCGACAAGTCTTACTTTAG 

A 

11*50 NCOI, 

ThrLeuThrHisProValThrLysTyrlleMetThrCysMetSerAlaAspLeuGluVal 
1202 ACCCTGACGCACCCAGTCACCAAATACATCATGACATGCATGTCGGCCGACCTGGAGGTC 
TGGGACTGCGTGGGTCAGTGGTTTATGTAGTACTGTACGTACAGCCGGCTGGACCTCCAG 

1230 BSPH1, 1234 DRD1, 1237 AVA3, 1245 EAG1 XMA3, 1250 DRD1 , 



ValThrSerThrTrpValLeuValGlyGlyValLeuAlaAlaLeuAlaAlaTyrCysLeu 
12 62 GTCACGAGCACCTGGGTGCTCGTTGGCGGCGTCCTGGCTGCTTTGGCCGCGTATTGCCTG 
CAGTGCTCGTGGACCCACGAGCAACCGCCGCAGGACCGACGAAACCGGCGCATAACGGAC 

SerThrGlyCysValVallleValGlyArgValValLeuSerGlyLysProAlallelle 
1322 TCAACAGGCTGCGTGGTCATAGTGGGCAGGGTCGTCTTGTCCGGGAAGCCGGCAATCATA 
AGTTGTCCGACGCACCAGTATCACCCGTCCCAGCAGAACAGGCCCTTCGGCCGTTAGTAT 

A 

1369 NAEI, 

ProAspArgGluValLeuTyrArgGluPheAspGluMetGluGluCysSerGlnHisLeu 
1382 CCTGACAGGGAAGTCCTCTACCGAGAGTTCGATGAGATGGAAGAGTGCTCTCAGCACTTA 
GGACTGTCCCTTCAGGAGATGGCTCTCAAGCTACTCTACCTTCTCACGAGAGTCGTGAAT 

A 

1385 DRD1, 

ProTyrlieGluGlnGlyMetMetLeuAlaGluGlnPheLysGlnLysAlaLeuGlyLeu 
14 4 2 CCGTACATCGAGCAAGGGATGATGCTCGCCGAGCAGTTCAAGCAGAAGGCCCTCGGCCTC 
GGCATGTAGCTCGTTCCCTACTACGAGCGGCTCGTCAAGTTCGTCTTCCGGGAGCCGGAG 

LeuGlnThrAlaSerArgGlnAlaGluVallleAlaProAlaValGlnThrAsnTrpGln 
1502 CTGCAGACCGCGTCCCGTCAGGCAGAGGTTATCGCCCCTGCTGTCCAGACCAACTGGCAA 
GACGTCTGGCGCAGGGCAGTCCGTCTCCAATAGCGGGGACGACAGGTCTGGTTGACCGTT 

A A 

1502 PSTI, 1507 TTH3I, 

LysLeuGluThrPheTrpAlaLysHisMetTrpAsnPhelleSerGlylleGlnTyrLeu 
^ 1562 AAACTCGAGACCTTCTGGGCGAAGCATATGTGGAACTTCATCAGTGGGATACAATACTTG 
TTTGAGCTCTGGAAGACCCGCTTCGTATACACCTTGAAGTAGTCACCCTATGTTATGAAC 

A /S 

1565 XHOI, 1586 NDEI, 

AlaGlyLeuSerThrLeuProGlyAsnProAlalleAlaSerLeuMetAlaPheThrAla 
1622 GCGGGCTTGTCAACGCTGCCTGGTAACCCCGCCATTGCTTCATTGATGGCTTTTACAGCT 
CGCCCGAACAGTTGCGACGGACCATTGGGGCGGTAACGAAGTAACTACCGAAAATGTCGA 

A yy 

164 3 BSTE2, 1677 ALWN1 PVU2, 

AlaValThrSerProLeuThrThrSerGlnThrLeuLeuPheAsnlleLeuGlyGlyTrp 
1682 GCTGTCACCAGCCCACTAACCACTAGCCAAACCCTCCTCTTCAACATATTGGGGGGGTGG 
CGACAGTGGTCGGGTGATTGGTGATCGGTTTGGGAGGAGAAGTTGTATAACCCCCCCACC 
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ValAlaAlaGlnLeuAIaAlaProGlyAlaAlaThrAlaPheValGlyAlaGlyLeuAla 

17 42 GTGGCTGCCCAGCTCGCCGCCCCCGGTGCCGCTACTGCCTTTGTGGGCGCTGGCTTAGCT 

CACCGACGGGTCGAGCGGCGGGGGCCACGGCGATGACGGAAACACCCGCGACCGAATCGA 

A 

, 1794 ESP1, 

GlyAlaAlalleGlySerValGlyLeuGlyLysValLeuIleAspIleLeuAlaGlyTyr 
1802 GGCGCCGCCATCGGCAGTGTTGGACTGGGGAAGGTCCTCATAGACATCCTTGCAGGGTAT 
CCGCGGCGGTAGCCGTCACAACCTGACCCCTTCCAGGAGTATCTGTAGGAACGTCCCATA 

1802 KAS1 NARI , 

GlyAlaGlyValAlaGlyAlaLeuValAlaPheLysIleMetSerGIyGluValProSer 

18 62 GGCGCGGGCGTGGCGGGAGCTCTTGTGGCATTCAAGATCATGAGCGGTGAGGTCCCCTCC 

CCGCGCCCGCACCGCCCTCGAGAACACCGTAAGTTCTAGTACTCGCCACTCCAGGGGAGG 

A * 

1878 SACI, 1899 BSPH1, 

ThrGluAspLeuValAsnLeuLeuProAlalleLeuSerProGlyAlaLeuValValGly 
1922 ACGGAGGACCTGGTCAATCTACTGCCCGCCATCCTCTCGCCCGGAGCCCTCGTAGTCGGC 
TGCCTCCTGGACCAGTTAGATGACGGGCGGTAGGAGAGCGGGCCTCGGGAGCATCAGCCG 

A 

1928 TTH3I, 

ValValCysAlaAlalleLeuArgArgHisValGlyProGlyGluGlyAlaValGlnTrp 
198 2 GTGGTCTGTGCAGCAATACTGCGCCGGCACGTTGGCCCGGGCGAGGGGGCAGTGCAGTGG 
CACCAGACACGTCGTTATGACGCGGCCGTGCAACCGGGCCCGCTCCCCCGTCACGTCACC 

A A 

2004 NAEI f 2017 SMAI XMAI , 

MetAsnArgLeuIleAlaPheAlaSerArgGlyAsnHisValSerProThrHisTyrVal 
204 2 ATGAACCGGCTGATAGCCTTCGCCTCCCGGGGGAACCATGTTTCCCCCACGCACTACGTG 
TACTTGGCCGACTATCGGAAGCGGAGGGCCCCCTTGGTACAAAGGGGGTGCGTGATGCAC 

A A 

2067 SMAI XMAI, 2093 DRA3, 

ProGluSerAspAlaAlaAlaArgValThrAlalleLeuSerSerLeuThrValThrGIn 
2102 CCGGAGAGCGATGCAGCTGCCCGCGTCACTGCCATACTCAGCAGCCTCACTGTAACCCAG 
GGCCTCTCGCTACGTCGACGGGCGCAGTGACGGTATGAGTCGTCGGAGTGACATTGGGTC 

A A 

2115 PVU2 , 2159 ALWN1, 

LeuLeuArgArgLeuHisGlnTrpIleSerSerGluCysThrThrProCysSerGlySer 
2162 CTCCTGAGGCGACTGCACCAGTGGATAAGCTCGGAGTGTACCACTCCATGCTCCGGTTCC 
GAGGACTCCGCTGACGTGGTCACCTATTCGAGCCTCACATGGTGAGGTACGAGGCCAAGG 

A 

2164 MST2, 2220 ECON1, 

TrpLeuArgAspIleTrpAspTrpIleCysGluValLeuSerAspPheLysThrTrpLeu 
2222 TGGCTAAGGGACATCTGGGACTGGATATGCGAGGTGTTGAGCGACTTTAAGACCTGGCTA 
ACCGATTCCCTGTAGACCCTGACCTATACGCTCCACAACTCGCTGAAATTCTGGACCGAT 

LysAlaLysLeuMetProGlnLeuProGlylleProPheValSerCysGlnArgGlyTyr 
2282 AAAGCTAAGCTCATGCCACAGCTGCCTGGGATCCCCTTTGTGTCCTGCCAGCGCGGGTAT 
TTTCGATTCGAGTACGGTGTCGACGGACCCTAGGGGAAACACAGGACGGTCGCGCCCATA 

A A A 

2285 ESP1, 2300 PVU2, 2310 BAMHI, 
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LysGlyValTrpArgGlyAspGlylleMetHisThrArgCysHisCysGlyAlaGluIle 
2342 AAGGGGGTCTGGCGAGGGGACGGCATCATGCACACTCGCTGCCACTGTGGAGCTGAGATC 
TTCCCCCAGACCGCTCCCCTGCCGTAGTACGTGTGAGCGACGGTGACACCTCGACTCTAG 

ThrGlyHisValLysAsnGlyThrMetArglleValGlyProArgThrCysArgAsnMet 
2402 ACTGGACATGTCAAAAACGGGACGATGAGGATCGTCGGTCCTAGGACCTGCAGGAACATG 
TGACCTGTACAGTTTTTGCCCTGCTACTCCTAGCAGCCAGGATCCTGGACGTCCTTGTAC 

A AAA 

2425 BSAB1 , 2441 AVR2, 2448 SSE83871, 2449 PSTI, 

TrpSerGlyThrPheProIleAsnAlaTyrThrThrGlyProCysThrProLeuProAIa 
24 62 TGGAGTGGGACCTTCCCCATTAATGCCTACACCACGGGCCCCTGTACCCCCCTTCCTGCG 
ACCTCACCCTGGAAGGGGTAATTACGGATGTGGTGCCCGGGGACATGGGGGGAAGGACGC 

A A 

2480 ASE1, 2497 APAI, 

ProAsnTyrThrPheAlaLeuTrpArgValSerAlaGluGluTyrValGluIleArgGln 
2522 CCGAACTACACGTTCGCGCTATGGAGGGTGTCTGCAGAGGAATACGTGGAGATAAGGCAG 
GGCTTGATGTGCAAGCGCGATACCTCCCACAGACGTCTCCTTATGCACCTCTATTCCGTC 

A 

2553 PSTI, 

ValGlyAspPheHisTyrValThrGlyMetThrThrAspAsnLeuLysCysProCysGln 
2582 GTGGGGGACTTCCACTACGTGACGGGTATGACTACTGACAATCTTAAATGCCCGTGCCAG 
CACCCCCTGAAGGTGATGCACTGCCCATACTGATGACTGTTAGAATTTACGGGCACGGTC 

A 

2594 DRA3, 

ValProSerProGluPhePheThrGluLeuAspGlyValArgLeuHisArgPheAlaPro 
2642 GTCCCATCGCCCGAATTTTTCACAGAATTGGACGGGGTGCGCCTACATAGGTTTGCGCCC 
CAGGGTAGCGGGCTTAAAAAGTGTCTTAACCTGCCCCACGCGGATGTATCCAAACGCGGG 

ProCysLysProLeuLeuArgGluGluValSerPheArgValGlyLeuHisGluTyrPro 
27 02 CCCTGCAAGCCCTTGCTGCGGGAGGAGGTATCATTCAGAGTAGGACTCCACGAATACCCG 
GGGACGTTCGGGAACGACGCCCTCCTCCATAGTAAGTCTCATCCTGAGGTGCTTATGGGC 

A 

2757 HGIE2 , 

ValGlySerGlnLeuProCysGluProGluProAspValAlaValLeuThrSerMetLeu 
27 62 GTAGGGTCGCAATTACCTTGCGAGCCCGAACCGGACGTGGCCGTGTTGACGTCCATGCTC 
CATCCCAGCGTTAATGGAACGCTCGGGCTTGGCCTGCACCGGCACAACTGCAGGTACGAG 

A 

2809 AAT2 , 

ThrAspProSerHisIleThrAlaGluAlaAlaGlyArgArgLeuAlaArgGlySerPro 
2822 ACTGATCCCTCCCATATAACAGCAGAGGCGGCCGGGCGAAGGTTGGCGAGGGGATCACCC 
TGACTAGGGAGGGTATATTGTCGTCTCCGCCGGCCCGCTTCCAACCGCTCCCCTAGTGGG 

A 

2850 EAG1 XMA3, 

ProSerValAlaSerSerSerAlaSerGlnLeuSerAlaProSerLeuLysAlaThrCys 
2882 CCCTCTGTGGCCAGCTCCTCGGCTAGCCAGCTATCCGCTCCATCTCTCAAGGCAACTTGC 
GGGAGACACCGGTCGAGGAGCCGATCGGTCGATAGGCGAGGTAGAGAGTTCCGTTGAACG 



2889 BALI, 2903 NHEI, 
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ThrAlaAsnHisAspSerProAspAlaGluLeuIleGluAlaAsnLeuLeuTrDArgGIn 
294 2 ACCGCTAACCATGACTCCCCTGATGCTGAGCTCATAGAGGCCAACCTCCTATGGAGGCAG 
TGGCGATTGGTACTGAGGGGACTACGACTCGAGTATCTCCGGTTGGAGGATACCTCCGTC 

A A 

2966 ESP1, 2969 SAC I, 

GluMetGlyGlyAsnlleThrArgValGluSerGluAsnLysValVallleLeuAspSer 
3002 GAGATGGGCGGCAACATCACCAGGGTTGAGTCAGAAAACAAAGTGGTGATTCTGGACTCC 
CTCTACCCGCCGTTGTAGTGGTCCCAACTCAGTCTTTTGTTTCACCACTAAGACCTGAGG 

PheAspProLeuValAlaGluGluAspGluArgGluIleSerValProAlaGluIleLeu 
3062 TTCGATCCGCTTGTGGCGGAGGAGGACGAGCGGGAGATCTCCGTACCCGCAGAAATCCTG 
AAGCTAGGCGAACACCGCCTCCTCCTGCTCGCCCTCTAGAGGCATGGGCGTCTTTAGGAC 

A 

3096 BGL2, 

ArgLysSerArgArgPheAIaGlnAlaLeuProValTrpAlaArgProAspTyrAsnPro 
3122 CGGAAGTCTCGGAGATTCGCCCAGGCCCTGCCCGTTTGGGCGCGGCCGGACTATAACCCC 
GCCTTCAGAGCCTCTAAGCGGGTCCGGGACGGGCAAACCCGCGCCGGCCTGATATTGGGG 

A A 

3143 ALWN1 , 3164 EAG1 XMA3, 

ProLeuValGluThrTrpLysLysProAspTyrGluProProValValHisGlyCysPro 
3182 CCGCTAGTGGAGACGTGGAAAAAGCCCGACTACGAACCACCTGTGGTCCATGGCTGCCCG 
GGCGATCACCTCTGCACCTTTTTCGGGCTGATGCTTGGTGGACACCAGGTACCGACGGGC 

A A 

3217 HGIE2, 3229 NCOI, 

LeuProProProLysSerProProValProProProArgLysLysArgThrValValLeu 
324 2 CTTCCACCTCCAAAGTCCCCTCCTGTGCCTCCGCCTCGGAAGAAGCGGACGGTGGTCCTC 
GAAGGTGGAGGTTTCAGGGGAGGACACGGAGGCGGAGCCTTCTTCGCCTGCCACCAGGAG 

ThrGluSerThrLeuSerThrAlaLeuAlaGluLeuAlaThrArgSerPheGlySerSer 
3302 ACTGAATCAACCCTATCTACTGCCTTGGCCGAGCTCGCCACCAGAAGCTTTGGCAGCTCC 
TGACTTAGTTGGGATAGATGACGGAACCGGCTCGAGCGGTGGTCTTCGAAACCGTCGAGG 

a a 

3332 SAC I, 3346 HIND3, 

SerThrSerGlylleThrGlyAspAsnThrThrThrSerSerGluProAlaProSerGly 
3362 TCAACTTCCGGCATTACGGGCGACAATACGACAACATCCTCTGAGCCCGCCCCTTCTGGC 
AGTTGAAGGCCGTAATGCCCGCTGTTATGCTGTTGTAGGAGACTCGGGCGGGGAAGACCG 

Cys ProProAspSerAspAlaGluSerTyrSerSerMetProProLeuGluGlyGluPro 
34 22 TGCCCCCCCGACTCCGACGCTGAGTCCTATTCCTCCATGCCCCCCCTGGAGGGGGAGCCT 
ACGGGGGGGCTGAGGCTGCGACTCAGGATAAGGAGGTACGGGGGGGACCTCCCCCTCGGA 

A 

3437 EAM11051, 

GlyAspProAspLeuSerAspGlySerTrpSerThrValSerSerGluAlaAsnAlaGlu 
34 82 GGGGATCCGGATCTTAGCGACGGGTCATGGTCAACGGTCAGTAGTGAGGCCAACGCGGAG 
CCCCTAGGCCTAGAATCGCTGCCCAGTACCAGTTGCCAGTCATCACTCCGGTTGCGCCTC 

A A A 

3484 BAMHI, 3485 BSAB1, 3487 BSPE1, 

AspValValCysCysSerMetSerTyrSerTrpThrGlyAlaLeuValThrProCysAla 
354 2 GATGTCGTGTGCTGCTCAATGTCTTACTCTTGGACAGGCGCACTCGTCACCCCGTGCGCC 
CTACAGCACACGACGAGTTACAGAATGAGAACCTGTCCGCGTGAGCAGTGGGGCACGCGG 
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3589 DRA3, 3600 SAC2, ' 

AlaGluGluGlnLysLeuProIleAsnAlaLeuSerAsnSerLeuLeuArgHisHisAsn 
3602 GCGGAAGAACAGAAACTGCCCATCAATGCACTAAGCAACTCGTTGCTACGTCACCACAAT 
CGCCTTCTTGTCTTTGACGGGTAGTTACGTGATTCGTTGAGCAACGATGCAGTGGTGTTA 

A A 

3611 ALWN1 , 3655 PFLM1, 

LeuValTyrSerThrThrSerArgSerAlaCysGlnArgGlnLysLysValThrPheAsp 
3662 TTGGTGTATTCCACCACCTCACGCAGTGCTTGCCAAAGGCAGAAGAAAGTCACATTTGAC 
AACCACATAAGGTGGTGGAGTGCGTCACGAACGGTTTCCGTCTTCTTTCAGTGTAAACTG 

A. 

3681 DRA3, 

ArgLeuGlnValLeuAspSerHisTyrGlnAspValLeuLysGluValLysAlaAlaAla 
3722 AGACTGCAAGTTCTGGACAGCCATTACCAGGACGTACTCAAGGAGGTTAAAGCAGCGGCG 
TCTGACGTTCAAGACCTGTCGGTAATGGTCCTGCATGAGTTCCTCCAATTTCGTCGCCGC 

SerLysValLysAlaAsnLeuLeuSerValGluGluAlaCysSerLeuThrProProHis 

37 82 TCAAAAGTGAAGGCTAACTTGCTATCCGTAGAGGAAGCTTGCAGCCTGACGCCCCCACAC 

AGTTTTCACTTCCGATTGAACGATAGGCATCTCCTTCGAACGTCGGACTGCGGGGGTGTG 

3816 HIND3, 

SerAlaLysSerLysPheGlyTyrGlyAlaLysAspValArgCysHisAlaArgLysAla 

38 4 2 TCAGCCAAATCCAAGTTTGGTTATGGGGCAAAAGACGTCCGTTGCCATGCCAGAAAGGCC 

AGTCGGTTTAGGTTCAAACCAATACCCCGTTTTCTGCAGGCAACGGTACGGTCTTTCCGG 

A A 

3875 AAT2, 3890 BGLI, 

ValThrHisIleAsnSerValTrpLysAspLeuLeuGluAspAsnValThrProIleAsp 
3902 GTAACCCACATCAACTCCGTGTGGAAAGACCTTCTGGAAGACAATGTAACACCAATAGAC 
CATTGGGTGTAGTTGAGGCACACCTTTCTGGAAGACCTTCTGTTACATTGTGGTTATCTG 

ThrThrlleMetAlaLysAsnGluValPheCysValGlnProGluLysGlyGlyArgLys 
3962 ACTACCATCATGGCTAAGAACGAGGTTTTCTGCGTTCAGCCTGAGAAGGGGGGTCGTAAG 
TGATGGTAGTACCGATTCTTGCTCCAAAAGACGCAAGTCGGACTCTTCCCCCCAGCATTC 

ProAlaArgLeuIleValPheProAspLeuGlyValArgValCysGluLysMetAlaLeu 
4022 CCAGCTCGTCTCATCGTGTTCCCCGATCTGGGCGTGCGCGTGTGCGAAAAGATGGCTTTG 
GGTCGAGCAGAGTAGCACAAGGGGCTAGACCCGCACGCGCACACGCTTTTCTACCGAAAC 

TyrAspValValThrLysLeuProLeuAlaValMetGlySerSerTyrGlyPheGlnTyr 
4082 TACGACGTGGTTACAAAGCTCCCCTTGGCCGTGATGGGAAGCTCCTACGGATTCCAATAC 
ATGCTGCACCAATGTTTCGAGGGGAACCGGCACTACCCTTCGAGGATGCCTAAGGTTATG 

SerProGlyGlnArgValGIuPheLeuValGlnAlaTrpLysSerLysLysThrProMet 
414 2 TCACCAGGACAGCGGGTTGAATTCCTCGTGCAAGCGTGGAAGTCCAAGAAAACCCCAATG 
AGTGGTCCTGTCGCCCAACTTAAGGAGCACGTTCGCACCTTCAGGTTCTTTTGGGGTTAC 

4160 ECORI, 

GlyPheSerTyrAspThrArgCysPheAspSerThrValThrGluSerAspIleArgThr 
4202 GGGTTCTCGTATGATACCCGCTGCTTTGACTCCACAGTCACTGAGAGCGACATCCGTACG 
CCCAAGAGCATACTATGGGCGACGAAACTGAGGTGTCAGTGACTCTCGCTGTAGGCATGC 
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4229 DRD1, 4236 ALWN1, 

GluGluAlalleTyrGlnCysCysAspLeuAspProGlnAlaArgValAlalleLysSer 
4262 GAGGAGGCAATCTACCAATGTTGTGACCTCGACCCCCAAGCCCGCGTGGCCATCAAGTCC 
CTCCTCCGTTAGATGGTTACAACACTGGAGCTGGGGGTTCGGGCGCACCGGTAGTTCAGG 

4301 BGLI, 4308 BALI, 

LeuThrGluArgLeuTyrValGlyGlyProLeuThrAsnSerArgGlyGluAsnCysGiy 
4322 CTCACCGAGAGGCTTTATGTTGGGGGCCCTCTTACCAATTCAAGGGGGGAGAACTGCGGC 
GAGTGGCTCTCCGAAATACAACCCCCGGGAGAATGGTTAAGTTCCCCCCTCTTGACGCCG 

a 

4 34 5 APAI, 

TyrArgArgCysArgAlaSerGlyValLeuThrThrSerCysGlyAsnThrLeuThrCys 
4 382 TATCGCAGGTGCCGCGCGAGCGGCGTACTGACAACTAGCTGTGGTAACACCCTCACTTGC 
ATAGCGTCCACGGCGCGCTCGCCGCATGACTGTTGATCGACACCATTGTGGGAGTGAACG 

TyrlleLysAIaArgAlaAlaCysArgAlaAlaGlyLeuGlnAspCysThrMetLeuVal 
4 4 42 TACATCAAGGCCCGGGCAGCCTGTCGAGCCGCAGGGCTCCAGGACTGCACCATGCTCGTG 
ATGTAGTTCCGGGCCCGTCGGACAGCTCGGCGTCCCGAGGTCCTGACGTGGTACGAGCAC 

4 4 52 SMAI XMAI, 

CysGlyAspAspLeuValVallleCysGluSerAlaGlyValGlnGluAspAlaAlaSer 
4 502 TGTGGCGACGACTTAGTCGTTATCTGTGAAAGCGCGGGGGTCCAGGAGGACGCGGCGAGC 
ACACCGCTGCTGAATCAGCAATAGACACTTTCGCGCCCCCAGGTCCTCCTGCGCCGCTCG 

A A 

4508 DRD1, 4511 TTH3I, 

LeuArgAlaPheThrGluAlaMetThrArgTyrSerAlaProProGlyAspProProGln 
4 562 CTGAGAGCCTTCACGGAGGCTATGACCAGGTACTCCGCCCCCCCTGGGGACCCCCCACAA 
GACTCTCGGAAGTGCCTCCGATACTGGTCCATGAGGCGGGGGGGACCCCTGGGGGGTGTT 

ProGluTyrAspLeuGluLeuIleThrSerCysSerSerAsnValSerValAlaHisAsp 
4 622 CCAGAATACGACTTGGAGCTCATAACATCATGCTCCTCCAACGTGTCAGTCGCCCACGAC 
GGTCTTATGCTGAACCTCGAGTATTGTAGTACGAGGAGGTTGCACAGTCAGCGGGTGCTG 

A 

4 637 SAC I, 

GlyAlaGlyLysArgValTyrTyrLeuThrArgAspProThrThrProLeuAlaArgAla 
4 682 GGCGCTGGAAAGAGGGTCTACTACCTCACCCGTGACCCTACAACCCCCCTCGCGAGAGCT 
CCGCGACCTTTCTCCCAGATGATGGAGTGGGCACTGGGATGTTGGGGGGAGCGCTCTCGA 

A 

4731 NRUI 7 

AlaTrpGluThrAlaArgHisThrProValAsnSerTrpLeuGlyAsnllelleMetPhe 
4 7 4 2 GCGTGGGAGACAGCAAGACACACTCCAGTCAATTCCTGGCTAGGCAACATAATCATGTTT 
CGCACCCTCTGTCGTTCTGTGTGAGGTCAGTTAAGGACCGATCCGTTGTATTAGTACAAA 

AlaProThrLeuTrpAlaArgMetlleLeuMetThrHisPhePheSerValLeuIleAla 
4 802 GCCCCCACACTGTGGGCGAGGATGATACTGATGACCCATTTCTTTAGCGTCCTTATAGCC 
CGGGGGT GTGACACCCGCTCCTACTATGACTACTGGGTAAAGAAATCGCAGGAATATCGG 

A A 

4806 PFLM1 , 4807 DRA3, 



ArgAspGlnLeuGluGlnAlaLeuAspCysGluIleTyrGlyAlaCysTyrSerlleGlu 
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4 862 AGGGACCAGCTTGAACAGGCCCTCGATTGCGAGATCTACGGGGCCTGCTACTCCATAGAA 

TCCCTGGTCGAACTTGTCCGGGAGCTAACGCTCTAGATGCCCCGGACGATGAGGTATCTT 

a 

4893 BGL2, 

ProLeuAspLeuProProIlelleGlnArgLeuHisGlyLeuSerAlaPheSerLeuHis 
4 922 CCACTGGATCTACCTCCAATCATTCAAAGACTCCATGGCCTCAGCGCATTTTCACTCCAC 
GGTGACCTAGATGGAGGTTAGTAAGTTTCTGAGGTACCGGAGTCGCGTAAAAGTGAGGTG 

A 

4954 NCOI, 

SerTyrSerProGlyGluIleAsnArgValAlaAlaCysLeuArgLysLeuGlyVaiPro 

4 982 AGTTACTCTCCAGGTGAAATCAATAGGGTGGCCGCATGCCTCAGAAAACTTGGGGTACCG 

TCAATGAGAGGTCCACTTTAGTTATCCCACCGGCGTACGGAGTCTTTTGAACCCCATGGC 

A A 

5015 SPHI, 5035 KPNI, 

ProLeuArgAlaTrpArgHisArgAlaArgSerValArgAlaArgLeuLeuAlaArgGly 

5 042 CCCTTGCGAGCTTGGAGACACCGGGCCCGGAGCGTCCGCGCTAGGCTTCTGGCCAGAGGA 

GGGAACGCTCGAACCTCTGTGGCCCGGGCCTCGCAGGCGCGATCCGAAGACCGGTCTCCT 

A A 

5064 APAI, 5091 BALI, 

GlyArgAlaAlalleCysGlyLysTyrLeuPheAsnTrpAlaValArgThrLysLeuLys 
5102 GGCAGGGCTGCCATATGTGGCAAGTACCTCTTCAACTGGGCAGTAAGAACAAAGCTCAAA 
CCGTCCCGACGGTATACACCGTTCATGGAGAAGTTGACCCGTCATTCTTGTTTCGAGTTT 

A 

5113 NDEI, 

LeuThrProIleAlaAlaAlaGlyGlnLeuAspLeuSerGlyTrpPheThrAlaGlyTyr 
5162 CTCACTCCAATAGCGGCCGCTGGCCAGCTGGACTTGTCCGGCTGGTTCACGGCTGGCTAC 
GAGTGAGGTTATCGCCGGCGACCGGTCGACCTGAACAGGCCGACCAAGTGCCGACCGATG 

A A A A 

5174 NOT I, 5175 EAG1 XMA3, 5182 BALI, 5186 PVU2, 

SerGlyGlyAspIleTyrHisSerValSerHisAlaArgProArgTrpIleTrpPheCys 
5222 AGCGGGGGAGACATTTATCACAGCGTGTCTCATGCCCGGCCCCGCTGGATCTGGTTTTGC 
TCGCCCCCTCTGTAAATAGTGTCGCACAGAGTACGGGCCGGGGCGACCTAGACCAAAACG 

A 

5240 DRA3, 

LeuLeuLeuLeuAlaAlaGlyValGlylleTyrLeuLeuProAsnArgMetSerThrAsn 
S 5282 CTACTCCTGCTTGCTGCAGGGGTAGGCATCTACCTCCTCCCCAACCGAATGAGCACGAAT 
GATGAGGACGAACGACGTCCCCATCCGTAGATGGAGGAGGGGTTGGCTTACTCGTGCTTA 

A 

5295 PSTI, 

ProLysProGlnArgLysThrLysArgAsnThrAsnArgArgProGlnAspValLysPhe 
5342 CCTAAACCTCAAAGAAAGACCAAACGTAACACCAACCGGCGGCCGCAGGACGTCAAGTTC 
GGATTTGGAGTTTCTTTCTGGTTTGCATTGTGGTTGGCCGCCGGCGTCCTGCAGTTCAAG 

A A A A 

5380 NOT I, 5381 EAG1 XMA3, 5390 AAT2, 5401 SMAI XMAI , 

ProGlyGlyGlyGlnlleValGlyGlyValTyrLeuLeuProArgArgGlyProArgLeu 
54 02 CCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGGCCCTAGATTG 
GGCCCACCGCC AGTCTAGCAACCACCTCAAATGAACAACGGCGCGTCCCCGGGATCTAAC 
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5449 APAI, 

GlyValArgAlaThrArgLysThrSerGluArgSerGlnProArgGlyArgArgGInPro 
54 62 GGTGTGCGCGCGACGAGAAAGACTTCCGAGCGGTCGCAACCTCGAGGTAGACGTCAGCCT 
CCACACGCGCGCTGCTCTTTCTGAAGGCTCGCCAGCGTTGGAGCTCCATCTGCAGTCGGA 

A A A A 

5461 BSSH2, 5478 XMNI, 5502 XHOI, 5511 AAT2, 

IleProLysAlaArgArgProGluGlyArgThrTrpAlaGlnProGlyTyrProTrpPro 
5522 ATCCCCAAGGCTCGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTACCCTTGGCCC 
TAGGGGTTCCGAGCAGCCGGGCTCCCGTCCTGGACCCGAGTCGGGCCCATGGGAACCGGG 

A A A A 

5548 ALWN1 , 5558 ESP1, 5564 SMAI XMAI, 5568 KPNI, 

LeuTyrGlyAsnGluGlyCysGlyTrpAlaGlyTrpLeuLeuSerProArgGlySerArg 
5582 CTCTATGGCAATGAGGGCTGCGGGTGGGCGGGATGGCTCCTGTCTCCCCGTGGCTCTCGG 
GAGATACCGTTACTCCCGACGCCCACCCGCCCTACCGAGGACAGAGGGGCACCGAGAGCC 

ProSerTrpGlyProThrAspProArgArgArgSerArgAsnLeuGlyLysVallleAsp 

564 2 CCTAGCTGGGGCCCCACAGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGGTCATCGAT 

GGATCGACCCCGGGGTGTCTGGGGGCCGCATCCAGCGCGTTAAACCCATTCCAGTAGCTA 
a /s 

5650 APAI, 5696 CLAI, 

ThrLeuThrCysGlyPheAlaAspLeuMetGlyTyrlleProLeuValOC AM 
57 02 ACCCTTACGTGCGGCTTCGCCGACCTCATGGGGTACATACCGCTCGTCTAATAGTCGAC 
TGGGAATGCACGCCGAAGCGGCTGGAGTACCCCATGTATGGCGAGCAGATTATCAGCTG 

A A 

5724 HGIE2, 5755 SALI, 
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MetAlaAlaTyrAlaAlaGlnGlyTyrLysValLeuValLeuAsn 
2 AGCTTACAAAACAAAATGGCTGCATATGCAGCTCAGGGCTATAAGGTGCTAGTACTCAAC 
TCGAATGTTTTGTTTTACCGACGTATACGTCGAGTCCCGATATTCCACGATCATGAGTTG 

1 HIND3, 24 NDEI, 52 SCAI, 

ProSerValAlaAlaThrLeuGlyPheGlyAlaTyrMetSerLysAlaHisGlylieAsp 
52 CCCTCTGTTGCTGCAACACTGGGCTTTGGTGCTTACATGTCCAAGGCTCATGGGATCGAT 
GGGAGACAACGACGTTGTGACCCGAAACCACGAATGTACAGGTTCCGAGTACCCTAGCTA 

116 CLAI, 

ProAsnlleArgThrGlyValArgThrlleThrThrGlySerProIleThrTyrSerThr 

1 2 2 CCTAACATCAGGACCGGGGTGAGAACAATTACCACTGGCAGCCCCATCACGTACTCCAr c 

GGATTGTAGTCCTGGCCCCACTCTTGTTAATGGTGACCGTCGGGGTAGTGCATGAGGTGG 

TyrGlyLysPheLeuAlaAspGlyGlyCysSerGlyGlyAlaTyrAspIlellelleCys 

1 3 2 TACGGCAAGTTCCTTGCCGACGGCGGGTGCTCGGGGGGCGCTTATGACATAATAATTTGT 

ATGCCGTTCAAGGAACGGCTGCCGCCCACGAGCCCCCCGCGAATACTGTATTATTAAACA 

AspGluCysHisSerThrAspAlaThrSerlleLeuGlylleGlyThrValLeuAspGin 
2 4 2 GACGAGTGCCACTCCACGGATGCCACATCCATCTTGGGCATTGGCACTGTCCTTGACCAA 
CTGCTCACGGTGAGGTGCCTACGGTGTAGGTAGAACCCGTAACCGTGACAGGAACTGGTT 

AlaGluThrAlaGlyAlaArgLeuValValLeuAlaThrAlaThrProProGlySerVal 
302 GCAGAGACTGCGGGGGCGAGACTGGTTGTGCTCGCCACCGCCACCCCTCCGGGCTCCGTC 
CGTCTCTGACGCCCCCGCTCTGACCAACACGAGCGGTGGCGGTGGGGAGGCCCGAGGCAG 

303 ALWN1, 

ThrVaiProHisProAsnlleGluGluValAlaLeuSerThrThrGlyGluIleProPhe 
362 ACTGTGCCCCATCCCAACATCGAGGAGGTTGCTCTGTCCACCACCGGAGAGATCCCTTTT 
TGACACGGGGTAGGGTTGTAGCTCCTCCAACGAGACAGGTGGTGGCCTCTCTAGGGAAAA 

TyrGlyLysAlalleProLeuGluVallleLysGlyGlyArgHisLeuIlePheCysHis 
422 TACGGCAAGGCTATCCCCCTCGAAGTAATCAAGGGGGGGAGACATCTCATCTTCTGTCAT 
ATGCCGTTCCGATAGGGGGAGCTTCATTAGTTCCCCCCCTCTGTAGAGTAGAAGACAGTA 
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SerLysLysLysCysAspGluLeuAlaAlaLysLeuValAlaLeuGlylleAsnAlaVal 
482 TCAAAGAAGAAGTGCGACGAACTCGCCGCAAAGCTGGTCGCATTGGGCATCAATGCCGTG 
AGTTTCTTCTTCACGCTGCTTGAGCGGCGTTTCGACCAGCGTAACCCGTAGTTACGGCAC 

AI^TyrTyrArgGlyLeuAspValSerVallleProThrSerGlyAspValValValVal 
54 2 GCCTACTACCGCGGTCTTGACGTGTCCGTCATCCCGACCAGCGGCGATGTTGTCGTCGTG 
CGGATGATGGCGCCAGAACTGCACAGGCAGTAGGGCTGGTCGCCGCTACAACAGCAGCAC 

A A 

550 SAC2 , 560 DRD1, 

AlaThrAspAlaLeuMetThrGlyTyrThrGlyAspPheAspSerVallleAspCysAsn 
602 GCAACCGATGCCCTCATGACCGGCTATACCGGCGACTTCGACTCGGTGATAGACTGCAAT 
CGTTGGCTACGGGAGTACTGGCCGATATGGCCGCTGAAGCTGAGCCACTATCTGACGTTA 

615 BSPH1, 

ThrCysValThrGlnThrValAspPheSerLeuAspProThrPheThrlleGluThrlle 
662 ACGTGTGTCACCCAGACAGTCGATTTCAGCCTTGACCCTACCTTCACCATTGAGACAATC 
TGCACACAGTGGGTCTGTCAGCTAAAGTCGGAACTGGGATGGAAGTGGTAACTCTGTTAG 

ThrLeuProGlnAspAlaValSerArgThrGlnArgArgGlyArgThrGlyArgGlyLys 
722 ACGCTCCCCCAAGATGCTGTCTCCCGCACTCAACGTCGGGGCAGGACTGGCAGGGGGAAG 
TGCGAGGGGGTTCTACGACAGAGGGCGTGAGTTGCAGCCCCGTCCTGACCGTCCCCCTTC 

ProGlylleTyrArgPheValAlaProGlyGluArgProSerGlyMetPheAspSerSer 
7 82 CCAGGCATCTACAGATTTGTGGCACCGGGGGAGCGCCCCTCCGGCATGTTCGACTCGTCC 
GGTCCGTAGATGTCTAAACACCGTGGCCCCCTCGCGGGGAGGCCGTACAAGCTGAGCAGG 

A A 

816 BGLI, 833 DRD1, 

ValLeuCysGluCysTyrAspAlaGlyCysAlaTrpTyrGluLeuThrProAlaGluThr 
84 2 GTCCTCTGTGAGTGCTATGACGCAGGCTGTGCTTGGTATGAGCTCACGCCCGCCGAGACT 
CAGGAGACACTCACGATACTGCGTCCGACACGAACCATACTCGAGTGCGGGCGGCTCTGA 

A 

881 SACI, 

ThrValArgLeuArgAlaTyrMetAsnThrProGlyLeuProValCysGlnAspHisLeu 
902 ACAGTTAGGCTACGAGCGTACATGAACACCCCGGGGCTTCCCGTGTGCCAGGACCATCTT 
TGTCAATCCGATGCTCGCATGTACTTGTGGGGCCCCGAAGGGCACACGGTCCTGGTAGAA 

A 

931 SMAI XMAI, 

GluPheTrpGluGlyValPheThrGlyLeuThrHisIleAspAlaHisPheLeuSerGln 
962 GAATTTTGGGAGGGCGTCTTTACAGGCCTCACTCATATAGATGCCCACTTTCTATCCCAG 
CTTAAAACCCTCCCGCAGAAATGTCCGGAGTGAGTATATCTACGGGTGAAAGATAGGGTC 

A 

985 STUI, 

ThrLysGlnSerGlyGluAsnLeuProTyrLeuValAlaTyrGlnAlaThrValCysAla 
1022 ACAAAGCAGAGTGGGGAGAACCTTCCTTACCTGGTAGCGTACCAAGCCACCGTGTGCGCT 
TGTTTCGTCTCACCCCTCTTGGAAGGAATGGACCATCGCATGGTTCGGTGGCACACGCGA 

A 

1069 DRA3, 

ArgAlaGlnAlaProProProSerTrpAspGlnMetTrpLysCysLeuIleArgLeuLys 
1082 AGGGCTCAAGCCCCTCCCCCATCGTGGGACCAGATGTGGAAGTGTTTGATTCGCCTCAAG 
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TCCCGAGTTCGGGGAGGGGGTAGCACCCTGGTCTACACCTTCACAAACTAAGCGGAGTTC 

ProThrLeuHisGlyProThrProLeuLeuTyrArgLeuGlyMaValGlnAsnGlulie 
1142 CCCACCCTCCATGGGCCAACACCCCTGCTATACAGACTGGGCGCTGTTCAGAATGAAATC 
, GGGTGGGAGGTACCCGGTTGTGGGGACGATATGTCTGACCCGCGACAAGTCTTACTTTAG 

A 

1150 NCOI, 

ThrLeuThrHisProValThrLysTyrlleMetThrCysMetSerAlaAspLeuGluVal 
1202 ACCCTGACGCACCCAGTCACCAAATACATCATGACATGCATGTCGGCCGACCTGGAGGTC 
TGGGACTGCGTGGGTCAGTGGTTTATGTAGTACTGTACGTACAGCCGGCTGGACCTCCAG 

/V A. /V. A A 

1230 BSPH1, 1234 DRD1, 1237 AVA3 , 1245 EAG1 XMA3, 1250 DRD1 , 



ValThrSerThrTrpValLeuValGlyGlyValLeuAlaAlaLeuAlaAlaTyrCysLeu 
1262 GTCACGAGCACCTGGGTGCTCGTTGGCGGCGTCCTGGCTGCTTTGGCCGCGTATTGCCTG 
CAGTGCTCGTGGACCCACGAGCAACCGCCGCAGGACCGACGAAACCGGCGCATAACGGAC 

SerThrGlyCysValVallleValGlyArgValValLeuSerGlyLysProAlallelle 
1322 TCAACAGGCTGCGTGGTCATAGTGGGCAGGGTCGTCTTGTCCGGGAAGCCGGCAATCATA 
AGTTGTCCGACGCACCAGTATCACCCGTCCCAGCAGAACAGGCCCTTCGGCCGTTAGTAT 

A 

1369 NAEI , 

ProAspArgGluValLeuTyrArgGluPheAspGluMetGluGluCysSerGlnHisLeu 
1382 CCTGACAGGGAAGTCCTCTACCGAGAGTTCGATGAGATGGAAGAGTGCTCTCAGCACTTA 
GGACTGTCCCTTCAGGAGATGGCTCTCAAGCTACTCTACCTTCTCACGAGAGTCGTGAAT 

A 

1385 DRD1, 

ProTyrlleGluGlnGlyMetMetLeuAlaGluGlnPheLysGlnLysAlaLeuGlyLeu 
14 4 2 CCGTACATCGAGCAAGGGATGATGCTCGCCGAGCAGTTCAAGCAGAAGGCCCTCGGCCTC 
GGCATGTAGCTCGTTCCCTACTACGAGCGGCTCGTCAAGTTCGTCTTCCGGGAGCCGGAG 

LeuGlnThrAlaSerArgGlnAlaGluVallleAlaProAlaValGlnThrAsnTrpGln 
1502 CTGCAGACCGCGTCCCGTCAGGCAGAGGTTATCGCCCCTGCTGTCCAGACCAACTGGCAA 
GACGTCTGGCGCAGGGCAGTCCGTCTCCAATAGCGGGGACGACAGGTCTGGTTGACCGTT 

A A 

1502 PSTI, 1507 TTH3I, 

LysLeuGluThrPheTrpAlaLysHisMetTrpAsnPhelleSerGlylleGlnTyrLeu 
* 1562 AAACTCGAGACCTTCTGGGCGAAGCATATGTGGAACTTCATCAGTGGGATACAATACTTG 
TTTGAGCTCTGGAAGACCCGCTTCGTATACACCTTGAAGTAGTCACCCTATGTTATGAAC 

A ^ 

1565 XHOI, 1586 NDEI, 

AlaGlyLeuSerThrLeuProGlyAsnProAlalleAlaSerLeuMetAlaPheThrAla 
1622 GCGGGCTTGTCAACGCTGCCTGGTAACCCCGCCATTGCTTCATTGATGGCTTTTACAGCT 
CGCCCGAACAGTTGCGACGGACCATTGGGGCGGTAACGAAGTAACTACCGAAAATGTCGA 

A ^ 

164 3 BSTE2, 1677 ALWN1 PVU2, 

AlaValThrSerProLeuThrThrSerGlnThrLeuLeuPheAsnlleLeuGlyGlyTrp 
1682 GCTGTCACCAGCCCACTAACCACTAGCCAAACCCTCCTCTTCAACATATTGGGGGGGTGG 
CGACAGTGGTCGGGTGATTGGTGATCGGTTTGGGAGGAGAAGTTGTATAACCCCCCCACC 
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ValAlaAlaGlnLeiiAlaAlaProGlyAlaAlaThrAlaPheValGlyAlaGlyLeuAla 
1742 GTGGCTGCCCAGCTCGCCGCCCCCGGTGCCGCTACTGCCTTTGTGGGCGCTGGCTT AGCT 
CACCGACGGGTCGAGCGGCGGGGGCCACGGCGATGACGGAAACACCCGCGACCGAATCGA 

A 

< 1794 ESP1, 

GlyAlaAlalleGlySerValGlyLeuGlyLysValLeuIleAspIleLeuAlaGlyTyr 
1802 GGCGCCGCCATCGGCAGTGTTGGACTGGGGAAGGTCCTCATAGACATCCTTGCAGGGTAT 
CCGCGGCGGTAGCCGTCACAACCTGACCCCTTCCAGGAGTATCTGTAGGAACGTCCCATA 

A 

1802 KAS1 NARI, 

GlyAlaGlyValAlaGlyAlaLeuValAIaPheLysIleMetSerGlyGluValProSer 
1862 GGCGCGGGCGTGGCGGGAGCTCTTGTGGCATTCAAGATCATGAGCGGTGAGGTCCCCTCC 
CCGCGCCCGCACCGCCCTCGAGAACACCGTAAGTTCTAGTACTCGCCACTCCAGGGGAGG 

A A 

1878 SACI, 1899 BSPH1, 

ThrGluAspLeuValAsnLeuLeuProAlalleLeuSerProGlyAlaLeuValValGly 
1922 ACGGAGGACCTGGTCAATCTACTGCCCGCCATCCTCTCGCCCGGAGCCCTCGTAGTCGGC 
TGCCTCCTGGACCAGTTAGATGACGGGCGGTAGGAGAGCGGGCCTCGGGAGCATCAGCCG 

1928 TTH3I, 

ValValCysAlaAlalleLeuArgArgHisValGlyProGlyGluGlyAlaValGlnTrp 
1982 GTGGTCTGTGCAGCAATACTGCGCCGGCACGTTGGCCCGGGCGAGGGGGCAGTGCAGTGG 
CACCAGACACGTCGTTATGACGCGGCCGTGCAACCGGGCCCGCTCCCCCGTCACGTCACC 

A A 

2004 NAEI, 2017 SMAI XMAI, 

MetAsnArgLeuIleAlaPheAlaSerArgGlyAsnHisValSerProThrHisTyrVal 
2042 ATGAACCGGCTGATAGCCTTCGCCTCCCGGGGGAACCATGTTTCCCCCACGCACTACGTG 
TACTTGGCCGACTATCGGAAGCGGAGGGCCCCCTTGGTACAAAGGGGGTGCGTGATGCAC 

A A 

2067 SMAI XMAI, 2093 DRA3, 

ProGluSerAspAlaAlaAlaArgValThrAlalleLeuSerSerLeuThrValThrGln 
2102 CCGGAGAGCGATGCAGCTGCCCGCGTCACTGCCATACTCAGCAGCCTCACTGTAACCCAG 
GGCCTCTCGCTACGTCGACGGGCGCAGTGACGGTATGAGTCGTCGGAGTGACATTGGGTC 

A A 

2115 PVU2, 2159 ALWN1, 

LeuLeuArgArgLeuHisGlnTrpIleSerSerGluCysThrThrProCysSerGlySer 
2162 CTCCTGAGGCGACTGCACCAGTGGATAAGCTCGGAGTGTACCACTCCATGCTCCGGTTCC 
GAGGACTCCGCTGACGTGGTCACCTATTCGAGCCTCACATGGTGAGGTACGAGGCCAAGG 

2164 MST2, 2220 ECON1, 

TrpLeuArgAspIleTrpAspTrpIleCysGluValLeuSerAspPheLysThrTrpLeu 
2222 TGGCTAAGGGACATCTGGGACTGGATATGCGAGGTGTTGAGCGACTTTAAGACCTGGCTA 
ACCGATTCCCTGTAGACCCTGACCTATACGCTCCACAACTCGCTGAAATTCTGGACCGAT 

LysAlaLysLeuMetProGlnLeuProGlylleProPheValSerCysGlnArgGlyTyr 
2282 AAAGCTAAGCTCATGCCACAGCTGCCTGGGATCCCCTTTGTGTCCTGCCAGCGCGGGTAT 
TTTCGATTCGAGTACGGTGTCGACGGACCCTAGGGGAAACACAGGACGGTCGCGCCCATA 



2285 ESP1, 2300 PVU2, 2310 BAMHI, 
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LysGlyValTrpArgGlyAspGlylleMetHisThrArgCysHisCysGlyAlaGluIle 
2342 AAGGGGGTCTGGCGAGGGGACGGCATCATGCACACTCGCTGCCACTGTGGAGCTGAGATC 
TTCCCCCAGACCGCTCCCCTGCCGTAGTACGTGTGAGCGACGGTGACACCTCGACTCTAG 

r ThrGlyHisValLysAsnGlyThrMetArglleValGlyProArgThrCysArgAsnMet 
24 02 ACTGGACATGTCAAAAACGGGACGATGAGGATCGTCGGTCCTAGGACCTGCAGGAACATG 
TGACCTGTACAGTTTTTGCCCTGCTACTCCTAGCAGCCAGGATCCTGGACGTCCTTGTAC 

A A A 

2425 BSAB1, 2441 AVR2, 2448 SSE83871, 2449 PSTI, 

TrpSerGlyThrPheProIleAsnAlaTyrThrThrGlyProCysThrProLeuProAla 
2 4 62 TGGAGTGGGACCTTCCCCATTAATGCCTACACCACGGGCCCCTGTACCCCCCTTCCTGCG 
ACCTCACCCTGGAAGGGGTAATTACGGATGTGGTGCCCGGGGACATGGGGGGAAGGACGC 

A A 

2480 ASE1, 2497 APAI, 

ProAsnTyrThrPheAlaLeuTrpArgValSerAlaGluGluTyrValGluIleArgGln 
2522 CCGAACTACACGTTCGCGCTATGGAGGGTGTCTGCAGAGGAATACGTGGAGATAAGGCAG 
GGCTTGATGTGCAAGCGCGATACCTCCCACAGACGTCTCCTTATGCACCTCTATTCCGTC 

A 

2553 PSTI, 

ValGlyAspPheHisTyrValThrGlyMetThrThrAspAsnLeuLysCysProCysGln 
2582 GTGGGGGACTTCCACTACGTGACGGGTATGACTACTGACAATCTTAAATGCCCGTGCCAG 
CACCCCCTGAAGGTGATGCACTGCCCATACTGATGACTGTTAGAATTTACGGGCACGGTC 

A 

2594 DRA3 , 

ValProSerProGluPhePheThrGIuLeuAspGlyValArgLeuHisArgPheAlaPro 
2 642 GTCCCATCGCCCGAATTTTTCACAGAATTGGACGGGGTGCGCCTACATAGGTTTGCGCCC 
CAGGGTAGCGGGCTTAAAAAGTGTCTTAACCTGCCCCACGCGGATGTATCCAAACGCGGG 

ProCysLysProLeuLeuArgGluGluValSerPheArgValGlyLeuHisGluTyrPro 
27 02 CCCTGCAAGCCCTTGCTGCGGGAGGAGGTATCATTCAGAGTAGGACTCCACGAATACCCG 
GGGACGTTCGGGAACGACGCCCTCCTCCATAGTAAGTCTCATCCTGAGGTGCTTATGGGC 

2757 HGIE2, 

ValGlySerGlnLeuProCysGluProGluProAspValAIaValLeuThrSerMetLeu 
27 62 GTAGGGTCGCAATTACCTTGCGAGCCCGAACCGGACGTGGCCGTGTTGACGTCCATGCTC 
CATCCCAGCGTTAATGGAACGCTCGGGCTTGGCCTGCACCGGCACAACTGCAGGTACGAG 

2809 AAT2 , 

ThrAspProSerHisIleThrAlaGluAlaAlaGlyArgArgLeuAlaArgGlySerPro 
2822 ACTGATCCCTCCCATATAACAGCAGAGGCGGCCGGGCGAAGGTTGGCGAGGGGATCACCC 
TGACTAGGGAGGGTATATTGTCGTCTCCGCCGGCCCGCTTCCAACCGCTCCCCTAGTGGG 

A 

2850 EAG1 XMA3, 

ProSerValAlaSerSerSerAlaSerGlnLeuSerAlaProSerLeuLysAlaThrCys 
2882 CCCTCTGTGGCCAGCTCCTCGGCTAGCCAGCTATCCGCTCCATCTCTCAAGGCAACTTGC 
GGGAGACACCGGTCGAGGAGCCGATCGGTCGATAGGCGAGGTAGAGAGTTCCGTTGAACG 

A A 

2889 BALI, 2903 NHEI, 
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ThrAlaAsnHisAspSerProAspAlaGluLeuIleGluAlaAsnLeuLeuTrpArgGIn 
2942 ACCGCTAACCATGACTCCCCTGATGCTGAGCTCATAGAGGCCAACCTCCTATGGAGGCAG 
TGGCGATTGGTACTGAGGGGACTACGACTCGAGTATCTCCGGTTGGAGGATACCTCCGTC 

A A 

2966 ESP1, 2969 SACI, 

GluMetGlyGlyAsnlleThrArgValGluSerGluAsnLysValVallleLeuAspSer 
3002 GAGATGGGCGGCAACATCACCAGGGTTGAGTCAGAAAACAAAGTGGTGATTCTGGACTCC 
CTCTACCCGCCGTTGTAGTGGTCCCAACTCAGTCTTTTGTTTCACCACTAAGACCTGAGG 

PheAspProLeuValAlaGluGluAspGluArgGluIleSerValProAlaGluIleLeu 
3062 TTCGATCCGCTTGTGGCGGAGGAGGACGAGCGGGAGATCTCCGTACCCGCAGAAATCCTG 
AAGCTAGGCGAACACCGCCTCCTCCTGCTCGCCCTCTAGAGGCATGGGCGTCTTTAGGAC 

A 

3096 BGL2 , 

ArgLysSerArgArgPheAlaGlnAlaLeuProValTrpAlaArgProAspTyrAsnPro 
3122 CGGAAGTCTCGGAGATTCGCCCAGGCCCTGCCCGTTTGGGCGCGGCCGGACTATAACCCC 
GCCTTCAGAGCCTCTAAGCGGGTCCGGGACGGGCAAACCCGCGCCGGCCTGATATTGGGG 

A A 

3143 ALWN1, 3164 EAG1 XMA3, 

ProLeuValGluThrTrpLysLysProAspTyrGluProProValValHisGlyCysPro 
3182 CCGCTAGTGGAGACGTGGAAAAAGCCCGACTACGAACCACCTGTGGTCCATGGCTGCCCG 
GGCGATCACCTCTGCACCTTTTTCGGGCTGATGCTTGGTGGACACCAGGTACCGACGGGC 

A A 

3217 HGIE2 , 3229 NCOI, 

LeuProProProLysSerProProValProProProArgLysLysArgThrValValLeu 
32 4 2 CTTCCACCTCCAAAGTCCCCTCCTGTGCCTCCGCCTCGGAAGAAGCGGACGGTGGTCCTC 
GAAGGTGGAGGTTTCAGGGGAGGACACGGAGGCGGAGCCTTCTTCGCCTGCCACCAGGAG 

ThrGluSerThrLeuSerThrAlaLeuAlaGluLeuAlaThrArgSerPheGlySerSer 
3302 ACTGAATCAACCCTATCTACTGCCTTGGCCGAGCTCGCCACCAGAAGCTTTGGCAGCTCC 
TGACTTAGTTGGGATAGATGACGGAACCGGCTCGAGCGGTGGTCTTCGAAACCGTCGAGG 

A ^ 

3332 SACI, 3346 HIND3, 

SerThrSerGlylleThrGlyAspAsnThrThrThrSerSerGluProAlaProSerGly 
3362 TCAACTTCCGGCATTACGGGCGACAATACGACAACATCCTCTGAGCCCGCCCCTTCTGGC 
AGTTGAAGGCCGTAATGCCCGCTGTTATGCTGTTGTAGGAGACTCGGGCGGGGAAGACCG 

CysProProAspSerAspAIaGluSerTyrSerSerMetProProLeuGluGlyGluPro 
3422 TGCCCCCCCGACTCCGACGCTGAGTCCTATTCCTCCATGCCCCCCCTGGAGGGGGAGCCT 
ACGGGGGGGCTGAGGCTGCGACTCAGGATAAGGAGGTACGGGGGGGACCTCCCCCTCGGA 

3437 EAM11051, 

GlyAspProAspLeuSerAspGlySerTrpSerThrValSerSerGluAlaAsnAlaGlu 
34 82 GGGGATCCGGATCTTAGCGACGGGTCATGGTCAACGGTCAGTAGTGAGGCCAACGCGGAG 
CCCCTAGGCCTAGAATCGCTGCCCAGTACCAGTTGCCAGTCATCACTCCGGTTGCGCCTC 

A A A 

3484 BAMHI, 3485 BSAB1 , 3487 BSPE1, 

AspValValCysCysSerMetSerTyrSerTrpThrGlyAlaLeuValThrProCysAla 
3542 GATGTCGTGTGCTGCTCAATGTCTTACTCTTGGACAGGCGCACTCGTCACCCCGTGCGCC 
CTACAGCACACGACGAGTTACAGAATGAGAACCTGTCCGCGTGAGCAGTGGGGCACGCGG 
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3589 DRA3, 3600 SAC2, - 

AlaGluGluGlnLysLeuProIleAsnAlaLeuSerAsnSerLeuLeuArgHisHisAsn 
3602 GCGGAAGAACAGAAACTGCCCATCAATGCACTAAGCAACTCGTTGCTACGTCACCACAAT 
' CGCCTTCTTGTCTTTGACGGGTAGTTACGTGATTCGTTGAGCAACGATGCAGTGGTGTTA 

A A 

3611 ALWN1, 3655 PFLM1, 

LeuValTyrSerThrThrSerArgSerAlaCysGlnArgGlnLysLysValThrPheAsp 
3662 TTGGTGTATTCCACCACCTCACGCAGTGCTTGCCAAAGGCAGAAGAAAGTCACATTTGAC 
AACCACATAAGGTGGTGGAGTGCGTCACGAACGGTTTCCGTCTTCTTTCAGTGTAAACTG 

A 

3681 DRA3, 

ArgLeuGlnValLeuAspSerHisTyrGlnAspValLeuLysGluValLysAlaAlaAla 
3722 AGACTGCAAGTTCTGGACAGCCATTACCAGGACGTACTCAAGGAGGTTAAAGCAGCGGCG 
TCTGACGTTCAAGACCTGTCGGTAATGGTCCTGCATGAGTTCCTCCAATTTCGTCGCCGC 

SerLysValLysAlaAsnLeuLeuSerValGluGluAlaCysSerLeuThrProProHis 
3782 TCAAAAGTGAAGGCTAACTTGCTATCCGTAGAGGAAGCTTGCAGCCTGACGCCCCCACAC 
AGTTTTCACTTCCGATTGAACGATAGGCATCTCCTTCGAACGTCGGACTGCGGGGGTGTG 

A 

3816 HIND3, 

SerAlaLysSerLysPheGlyTyrGlyAlaLysAspValArgCysHisAlaArgLysAla 
3842 TCAGCCAAATCCAAGTTTGGTTATGGGGCAAAAGACGTCCGTTGCCATGCCAGAAAGGCC 
AGTCGGTTTAGGTTCAAACCAATACCCCGTTTTCTGCAGGCAACGGTACGGTCTTTCCGG 

A A 

3875 AAT2 , 3890 BGLI, 

ValThrHisIleAsnSerValTrpLysAspLeuLeuGluAspAsnValThrProIleAsp 

3 902 GTAACCCACATCAACTCCGTGTGGAAAGACCTTCTGGAAGACAATGTAACACCAATAGAC 

CATTGGGTGTAGTTGAGGCACACCTTTCTGGAAGACCTTCTGTTACATTGTGGTTATCTG 

ThrThrlleMetAlaLysAsnGluValPheCysValGlnProGluLysGlyGlyArgLys 
3962 ACTACCATCATGGCTAAGAACGAGGTTTTCTGCGTTCAGCCTGAGAAGGGGGGTCGTAAG 
TGATGGTAGTACCGATTCTTGCTCCAAAAGACGCAAGTCGGACTCTTCCCCCCAGCATTC 

ProAlaArgLeuIleValPheProAspLeuGlyValArgValCysGluLysMetAlaLeu 

4 022 CCAGCTCGTCTCATCGTGTTCCCCGATCTGGGCGTGCGCGTGTGCGAAAAGATGGCTTTG 

GGTCGAGCAGAGTAGCACAAGGGGCTAGACCCGCACGCGCACACGCTTTTCTACCGAAAC 

TyrAspValValThrLysLeuProLeuAlaValMetGlySerSerTyrGlyPheGlnTyr 
4 082 TACGACGTGGTTACAAAGCTCCCCTTGGCCGTGATGGGAAGCTCCTACGGATTCCAATAC 
ATGCTGCACCAATGTTTCGAGGGGAACCGGCACTACCCTTCGAGGATGCCTAAGGTTATG 

SerProGlyGlnArgValGluPheLeuValGlnAlaTrpLysSerLysLysThrProMet 
414 2 TCACCAGGACAGCGGGTTGAATTCCTCGTGCAAGCGTGGAAGTCCAAGAAAACCCCAATG 
AGTGGTCCTGTCGCCCAACTTAAGGAGCACGTTCGCACCTTCAGGTTCTTTTGGGGTTAC 

A 

4160 ECORI, 

GlyPheSerTyrAspThrArgCysPheAspSerThrValThrGluSerAspIleArgThr 
4202 GGGTTCTCGTATGATACCCGCTGCTTTGACTCCACAGTCACTGAGAGCGACATCCGTACG 
CCCAAGAGCATACTATGGGCGACGAAACTGAGGTGTCAGTGACTCTCGCTGTAGGCATGC 
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4229 DRD1, 4236 ALWN1 , 

GluGluAlalleTyrGlnCysCysAspLeuAspProGlnAlaArgValAlalleLysSer 

4262 GAGGAGGCAATCTACCAATGTTGTGACCTCGACCCCCAAGCCCGCGTGGCCATCAAGTCC 

CTCCTCCGTTAGATGGTTACAACACTGGAGCTGGGGGTTCGGGCGCACCGGTAGTTCAGG 

a a 

4301 BGLI , 4308 BALI, 

LeuThrGluArgLeuTyrValGlyGlyProLeuThrAsnSerArgGlyGlxiAsnCysGly 
4322 CTCACCGAGAGGCTTTATGTTGGGGGCCCTCTTACCAATTCAAGGGGGGAGAACTGCGGC 
GAGTGGCTCTCCGAAATACAACCCCCGGGAGAATGGTTAAGTTCCCCCCTCTTGACGCCG 

4 34 5 APAI, 

Ty'rArgArgCysArgAlaSerGlyValLeuThrThrSerCysGlyAsnThrLeuThrCys 
4 382 TATCGCAGGTGCCGCGCGAGCGGCGTACTGACAACTAGCTGTGGTAACACCCTCACTTGC 
ATAGCGTCCACGGCGCGCTCGCCGCATGACTGTTGATCGACACCATTGTGGGAGTGAACG 

TyrlleLysAlaArgAlaAlaCysArgAlaAlaGlyLeuGlnAspCysThrMetLeuVal 
4 4 4 2 TACATCAAGGCCCGGGCAGCCTGTCGAGCCGCAGGGCTCCAGGACTGCACCATGCTCGTG 
ATGTAGTTCCGGGCCCGTCGGACAGCTCGGCGTCCCGAGGTCCTGACGTGGTACGAGCAC 

A 

4 4 52 SMAI XMAI, 

CysGlyAspAspLeuValVallleCysGluSerAlaGlyValGlnGluAspAlaAlaSer 
4 502 TGTGGCGACGACTTAGTCGTTATCTGTGAAAGCGCGGGGGTCCAGGAGGACGCGGCGAGC 
ACACCGCTGCTGAATCAGCAATAGACACTTTCGCGCCCCCAGGTCCTCCTGCGCCGCTCG 

A A 

4508 DRD1, 4511 TTH3I, 

LeuArgAlaPheThrGluAlaMetThrArgTyrSerAlaProProGlyAspProProGln 
4 562 CTGAGAGCCTTCACGGAGGCTATGACCAGGTACTCCGCCCCCCCTGGGGACCCCCCACAA 
GACTCTCGGAAGTGCCTCCGATACTGGTCCATGAGGCGGGGGGGACCCCTGGGGGGTGTT 

ProGluTyrAspLeuGluLeuIleThrSerCysSerSerAsnValSerValAlaHisAsp 
4 622 CCAGAATACGACTTGGAGCTCATAACATCATGCTCCTCCAACGTGTCAGTCGCCCACGAC 
GGTCTTATGCTGAACCTCGAGTATTGTAGTACGAGGAGGTTGCACAGTCAGCGGGTGCTG 

A 

4 637 SAC I, 

GlyAlaGlyLysArgValTyrTyrLeuThrArgAspProThrThrProLeuAlaArgAla 
4 682 GGCGCTGGAAAGAGGGTCTACTACCTCACCCGTGACCCTACAACCCCCCTCGCGAGAGCT 
CCGCGACCTTTCTCCCAGATGATGGAGTGGGCACTGGGATGTTGGGGGGAGCGCTCTCGA 

A 

4731 NRUI, 

AlaTrpGluThrAlaArgHisThrProValAsnSerTrpLeuGlyAsnllelleMetPhe 
4 7 42 GCGTGGGAGACAGCAAGACACACTCCAGTCAATTCCTGGCTAGGCAACATAATCATGTTT 
CGCACCCTCTGTCGTTCTGTGTGAGGTCAGTTAAGGACCGATCCGTTGTATTAGTACAAA 

AlaProThrLeuTrpAlaArgMetlleLeuMetThrHisPhePheSerValLeuIleAla 
4 802 GCCCCCACACTGTGGGCGAGGATGATACTGATGACCCATTTCTTTAGCGTCCTTATAGCC 
CGGGGGTGTGACACCCGCTCCTACTATGACTACTGGGTAAAGAAATCGCAGGAATATCGG 

A A 

4806 PFLM1, 4807 DRA3, 

ArgAspGInLeuGIuGlnAlaLeuAspCysGluIleTyrGlyAlaCysTyrSerlleGlu 
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4 862 AGGGACCAGCTTGAACAGGCCCTCGATTGCGAGATCTACGGGGCCTGCTACTCCA7AGAA 
TCCCTGGTCGAACTTGTCCGGGAGCTAACGCTCTAGATGCCCCGGACGATGAGGTATCTT 

A 

4 893 BGL2, 

ProLeuAspLeuProProIlelieGlnArgLeuHisGlyLeuSerAlaPheSerLeuHis 
4 922 CCACTGGATCTACCTCCAATCATTCAAAGACTCCATGGCCTCAGCGCATTTTCACTCCAC 
GGTGACCTAGATGGAGGTTAGTAAGTTTCTGAGGTACCGGAGTCGCGTAAAAGTGAGGTG 

A 

4954 NCOI, 

SerTyrSerProGlyGluIleAsnArgValAlaAlaCysLeuArgLysLeuGlyValPro 
4 982 AGTTACTCTCCAGGTGAAATCAATAGGGTGGCCGCATGCCTCAGAAAACTTGGGGTACCG 
TCAATGAGAGGTCCACTTTAGTTATCCCACCGGCGTACGGAGTCTTTTGAACCCCATGGC 

A A 

5015 SPHI, 5035 KPNI, 

ProLeuArgAlaTrpArgHisArgAlaArgSerValArgAlaArgLeuLeuAlaArgGly 
504 2 CCCTTGCGAGCTTGGAGACACCGGGCCCGGAGCGTCCGCGCTAGGCTTCTGGCCAGAGGA 
GGGAACGCTCGAACCTCTGTGGCCCGGGCCTCGCAGGCGCGATCCGAAGACCGGTCTCCT 

a a 

5064 APAI, 5091 BALI, 

GlyArgAlaAlalleCysGlyLysTyrLeuPheAsnTrpAlaValArgThrLysLeuLys 
5102 GGCAGGGCTGCCATATGTGGCAAGTACCTCTTCAACTGGGCAGTAAGAACAAAGCTCAAA 
CCGTCCCGACGGTATACACCGTTCATGGAGAAGTTGACCCGTCATTCTTGTTTCGAGTTT 

A 

5113 NDEI, 

LeuThrProIleAlaAlaAlaGlyGlnLeuAspLeuSerGlyTrpPheThrAlaGlyTyr 
5162 CTCACTCCAATAGCGGCCGCTGGCCAGCTGGACTTGTCCGGCTGGTTCACGGCTGGCTAC 
GAGTGAGGTTATCGCCGGCGACCGGTCGACCTGAACAGGCCGACCAAGTGCCGACCGATG 

A A A A 

5174 NOTI, 5175 EAG1 XMA3, 5182 BALI, 5186 PVU2, 

SerGlyGlyAspIleTyrHisSerValSerHisAlaArgProArgTrpIleTrpPheCys 
5222 AGCGGGGGAGACATTTATCACAGCGTGTCTCATGCCCGGCCCCGCTGGATCTGGTTTTGC 
TCGCCCCCTCTGTAAATAGTGTCGCACAGAGTACGGGCCGGGGCGACCTAGACCAAAACG 

A 

5240 DRA3, 

LeuLeuLeuLeuAlaAlaGlyValGlylleTyrLeuLeuProAsnArgMetSerThrAsn 
S 528 2 CTACTCCTGCTTGCTGCAGGGGTAGGCATCTACCTCCTCCCCAACCGAATGAGCACGAAT 
GATGAGGACGAACGACGTCCCCATCCGTAGATGGAGGAGGGGTTGGCTTACTCGTGCTTA 

A 

5295 PSTI, 

ProLysProGlnArgLysThrLysArgAsnThrAsnArgArgProGlnAspValLysPhe 
534 2 CCTAAACCTCAAAGAAAGACCAAACGTAACACCAACCGGCGGCCGCAGGACGTCAAGTTC 
GGATTTGGAGTTTCTTTCTGGTTTGCATTGTGGTTGGCCGCCGGCGTCCTGCAGTTCAAG 

A A A A 

5380 NOTI , 5381 EAG1 XMA3, 5390 AAT2 , 5401 SMAI XMAI, 

ProGlyGlyGlyGlnlleValGlyGlyValTyrLeuLeuProArgArgGlyProArgLeu 
54 02 CCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGGCCCTAGATTG 
GGCCCACCGCCAGT CTAGCAACCACCTCAAATGAACAACGGCGCGTCCCCGGGATCTAAC 
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5449 APAI, 

GlyValArgAlaThrArgLysThrSerGluArgSerGlnProArgGlyArgArgGlnPro 
5462 GGTGTGCGCGCGACGAGAAAGACTTCCGAGCGGTCGCAACCTCGAGGTAGACGTCAGCCT 
CCACACGCGCGCTGCTCTTTCTGAAGGCTCGCCAGCGTTGGAGCTCCATCTGCAGTCGGA 

1 A A A A 

5467 BSSH2 , 5478 XMNI, 5502 XHOI, 5511 AAT2, 

IleProLysAlaArgArgProGluGlyArgThrTrpAlaGlnProGlyTyrProTrpPro 
5522 ATCCCCAAGGCTCGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTACCCTTGGCCC 
TAGGGGTTCCGAGCAGCCGGGCTCCCGTCCTGGACCCGAGTCGGGCCCATGGGAACCGGG 

A AAA 

5548 ALWN1, 5558 £SP1 # 5564 SMAI XMAI, 5568 KPNI, 

LeuTyrGlyAsnGluGlyCysGlyTrpAlaGlyTrpLeuLeuSerProArgGlySerArg 
5582 CTCTATGGCAATGAGGGCTGCGGGTGGGCGGGATGGCTCCTGTCTCCCCGTGGCTCTCGG 
GAGATACCGTTACTCCCGACGCCCACCCGCCCTACCGAGGACAGAGGGGCACCGAGAGCC 

ProSerTrpGlyProThrAspProArgArgArgSerArgAsnLeuGIyLysVallleAsp 
5642 CCTAGCTGGGGCCCCACAGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGGTCATCGAT 
GGATCGACCCCGGGGTGTCTGGGGGCCGCATCCAGCGCGTTAAACCCATTCCAGTAGCTA 

A /V 

5650 APAI , 5696 CLAI, 

ThrLeuThrCysGlyPheAlaAspLeuMetGlyTyrlleProLeuValGlyAlaProLeu 
5702 ACCCTTACGTGCGGCTTCGCCGACCTCATGGGGTACATACCGCTCGTCGGCGCCCCTCTT 
TGGGAATGCACGCCGAAGCGGCTGGAGTACCCCATGTATGGCGAGCAGCCGCGGGGAGAA 

A A A 

5724 HGIE2, 5750 KAS1 NARI, 5756 ECON1, 

GlyGlyAlaAlaArgAlaOC AM 
57 62 GGAGGCGCTGCCAGGGCCTAATAGTCGAC 
CCTCCGCGACGGTCCCGGATTATCAGCTG 

A 

5785 SAL I, 



AttyDktNo. PP01617.002 
2302-1617 

COMBINED DECLARATION AND POWER OF ATTORNEY 
FOR UTILITY PATENT APPLICATION 

AS A BELOW-NAMED INVENTOR, I HEREBY DECLARE THAT: 

My residence, post office address and citizenship are as stated below next to my name. 

I believe I am the original, first and sole inventor (if only one name is listed below) or an original, 
first and joint inventor (if more than one name is listed below) of the subject matter which is 
claimed and for which a patent is sought on the invention entitled: NOVEL HCV NON- 
STRUCTURAL POLYPEPTIDE the specification of which 

X is attached hereto 
was filed on 

and assigned Serial No. 

I HAVE REVIEWED AND UNDERSTAND THE CONTENTS OF THE ABOVE-IDENTIFIED 
SPECIFICATION, INCLUDING THE CLAIMS, AS AMENDED BY ANY AMENDMENT 
REFERRED TO ABOVE. 

I acknowledge and understand that I am an individual who has a duty to disclose information which 
is material to the patentability of the claims of this application in accordance with Title 37, Code of 
Federal Regulations, §§ 1.56(a) and (b) which state: 

(a) A patent by its very nature is affected with a public interest. The public interest is 
best served, and the most effective patent examination occurs when, at the time an 
application is being examined, the Office is aware of and evaluates the teachings of 
all information material to patentability. Each individual associated with the filing 
and prosecution of a patent application has a duty of candor and good faith in dealing 
with the Office, which includes a duty to disclose to the Office all information 
known to that individual to be material to patentability as defined in this section. 
The duty to disclose information exists with respect to each pending claim until the 
claim is canceled or withdrawn from consideration, or the application becomes 
abandoned. Information material to the patentability of a claim that is canceled or 
withdrawn from consideration need not be submitted if the information is not 
material to the patentability of any claim remaining under consideration in the 
application. There is no duty to submit information which is not material to the 
patentability of any existing claim. The duty to disclose all information known to be 
material to patentability is deemed to be satisfied if all information known to be 
material to patentability of any claim issued in a patent was cited by the Office or 
submitted to the Office in the manner prescribed by §§ 1.97(b)-(d) and 1.98. 
However, no patent will be granted on an application in connection with which fraud 
on the Office was practiced or attempted or the duty of disclosure was violated 
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through bad faith or intentional misconduct. The Office encourages applicants to 
carefully examine: 

(1) prior art cited in search reports of a foreign patent office in a counterpart 
application, and 

(2) the closest information over which individuals associated with the filing 
or prosecution of a patent application believe any pending claim patentably defines, 
to make sure that any material information contained therein is disclosed to the 
Office. 

(b) Under this section, information is material to patentability when it is not 
cumulative to information already of record or being made of record in the 
application, and 

(1) It establishes, by itself or in combination with other information, a prima 
facie case of unpatentability of a claim; or 

(2) It refiites, or is inconsistent with, a position the applicant takes in: 

(i) Opposing an argument of unpatentability relied on by the Office, 

or 

(ii) Asserting an argument of patentability. 

A prima facie case of unpatentability is established when the information compels a 
conclusion that a claim is unpatentable under the preponderance of evidence, burden- 
of-proof standard, giving each term in the claim its broadest reasonable construction 
consistent with the specification, and before any consideration is given to evidence 
which may be submitted in an attempt to establish a contrary conclusion of 
patentability. 

I do not know and do not believe this invention was ever known or used in the United States of 
America before my or our invention thereof, or patented or described in any printed publication in 
any country before my or our invention thereof or more than one year prior to said application. This 
invention was not in public use or on sale in the United States of America more than one year prior 
to this application. This invention has not been patented or made the subject of an inventor's 
certificate issued before the date of this application in any country foreign to the United States of 
America on any application filed by me or my legal representatives or assigns more than six months 
prior to this application. 

I hereby claim priority benefits under Title 35, United States Code § 1 19(e)(1) of any United States 
provisional application(s) for patent as indicated below and have also identified below any 
application for patent on this invention having a filing date before that of the application for patent 
on which priority is claimed: 

Date of Filing Priority 
Application No. fday/month/year) Claimed 

60/167,502 24 November 1999 Yes _X_ No _ 
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I hereby appoint the following attorneys and agents to prosecute that application and to transact all 
business in the Patent and Trademark Office connected therewith and to file, to prosecute and to 
transact all business in connection with all patent applications directed to the invention: 

Lisa E. Alexander, Reg. No. 41,576 
Robert P. Blackburn, Reg. No. 30,447 
Anne S. Dollard, Reg. No. 43,935 
Joseph H. Guth, Reg. No. 31,261 
Alisa A. Harbin, Reg. No. 33,895 
Charlene A. Launer, Reg. No. 33,035 
David P. Lentini, Reg. No. 33,944 
Kimberlin L. Morley, Reg. No. 35,391 
Roberta L. Robins, Reg. No. 33,208 
Dahna S. Pasternak, Reg. No. 41,411 
Gary R. Fabian, Ph.D., Reg. No. 33,875 
Cathleen M. Rocco, Reg. No. 46,172 

Address all correspondence to: Alisa A, Harbin at 

CHIRON CORPORATION 
Intellectual Property - R440 
P.O. Box 8097 
Emeryville, CA 94662-8097. 

Address all telephone calls to: Alisa A. Harbin at (510) 923-2708. 

This appointment, including the right to delegate this appointment, shall also apply to the same 
extent to any proceedings established by the Patent Cooperation Treaty. 

I hereby declare that all statements made herein of my own knowledge are true and that all 
statements made on information and belief are believed to be true; and further that these statements 
were made with the knowledge that willful false statements and the like so made are punishable by 
fine or imprisonment, or both, under § 1001 of Title 18 of the United States Code and that such 
willful false statements may jeopardize the validity of the application or any patent issued thereon. 



Signature: Date 

Full Name of Inventor: Doris COIT 

Citizenship: US 

Residence: 

Post Office Address: c/o Chiron Corporation, 4560 Horton Street - R440, Emeryville, CA 94608 
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Signature: Date 

Full Name of Inventor: Angelica MEDINA-SELBY 

Citizenship: US 

Residence: San Francisco, CA 

Post Office Address: c/o Chiron Corporation, 4560 Horton Street - R440, Emeryville, CA 94608 



Signature: Date . 

Full Name of Inventor: Mark SELBY 

Citizenship: US 

Residence: San Francisco, CA 



Post Office Address: c/o Chiron Corporation, 4560 Horton Street - R440, Emeryville, CA 94608 



Signature: Date . 

Full Name of Inventor: Michael HOUGHTON 
Citizenship: UK 
Residence: Berkeley, CA 



Post Office Address: c/o Chiron Corporation, 4560 Horton Street - R440, Emeryville, CA 94608 
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AttyDktNo.PP01617.002 
PATENT 

"Express Mail" Mailing Label No. EL 668 933 832 US 
Date of Deposit November 22, 2000 

I hereby certify that this paper or fee is being deposited with the United States Postal Service "Express 
Mail Post Office to Addressee" service under 37 C.F.R. § 1.10 on the date indicated above and is 
Jressed to the Assistant Commissioner for Patents, Washington, D.C. 20231. 




or Printed Name of Person Mailing Paper or Fee 



Signature of Person Mailing Paper or Fee 



IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

In Re Application of: 
COIT etal. 

Serial No.: Group Art Unit: Unassigned 

Filing Date: even date Examiner: Unassigned 

Title: NOVEL HCV NON-STRUCTURAL POLYPEPTIDE 

STATEMENT TO SUPPORT FILING AND SUBMISSION IN ACCORDANCE 
WITH 37 C.F.R. §§ 1.821-1.825 

Assistant Commissioner for Patents 
Washington, D.C. 20231 

Sir: 

The undersigned hereby states that the content of the attached papers and the computer-readable 
copy of the Sequence Listing, submitted in accordance with 37 C.F.R. §§ 1.821(c) and (e), respectively, 
are the same. 

Respectfully submitted, 



Date: A^V 22 \4tW 



Dahna S. Pasternak 
Registration No. 41,41 1 
Attorney for Applicants 



CHIRON CORPORATION 
Intellectual Property - R440 
P.O. Box 8097 
Emeryville, CA 94662-8097 
Telephone: (510) 923-2708 



Facsimile: (510) 655-3542 



SEQUENCE LISTING 

<110> Coit, Doris 

Medina-Selby, Angelica 
Selby, Mark 
Houghton, Michael 

<12 0> NOVEL HCV NON- STRUCTURAL POLYPEPTIDE 

<130> PP01617.002 

<140> 
<141> 

<160> 19 

<170> Patentln Ver. 2.0 

<210> 1 
<211> 9620 
<212> DNA 

<213> Artificial Sequence 

<220> 
<221> CDS 

<222> (1990) . . (7302) 
<220> 

<223> Description of Artificial Sequence: Hepatitis C pns345 



<400> 1 
cgcgcgtttc 


ggtgatgacg gtgaaaacct 


ctgacacatg 


cagctcccgg 


agacggtcac 


60 


agcttgtctg 


taagcggatg 


ccgggagcag 


acaagcccgt 


cagggcgcgt 


cagcgggtgt 


120 


tggcgggtgt 


cggggctggc 


ttaactatgc 


ggcatcagag 


cagattgtac 


tgagagtgca 


180 


ccatatgaag 


ctttttgcaa 


aagcctaggc 


ctccaaaaaa 


gcctcctcac 


tacttctgga 


240 


atagctcaga 


ggccgaggcg 


gcctcggcct 


ctgcataaat 


aaaaaaaatt 


agtcagccat 


300 


ggggcggaga 


atgggcggaa 


ctgggcgggg 


agggaattat 


tggctattgg 


ccattgcata 


360 


cgttgtatct 


atatcataat 


atgtacattt 


atattggctc 


atgtccaata 


tgaccgccat 


420 


gttgacattg 


attattgact 


agttattaat 


agtaatcaat 


tacggggtca 


ttagttcata 


480 


gcccatatat 


ggagttccgc 


gttacataac 


ttacggtaaa 


tggcccgcct 


ggctgaccgc 


540 


ccaacgaccc 


ccgcccattg 


acgtcaataa 


tgacgtatgt 


tcccatagta 


acgccaatag 


600 


ggactttcca 


ttgacgtcaa 


tgggtggagt 


atttacggta 


aactgcccac 


ttggcagtac 


660 


atcaagtgta 


tcatatgcca 


agtccgcccc 


ctattgacgt 


caatgacggt 


aaatggcccg 


720 


cctggcatta 


tgcccagtac 


atgaccttac 


gggactttcc 


tacttggcag 


tacatctacg 


780 
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tattagtcat cgctattacc atggtgatgc ggttttggca gtacaccaat gggcgtggat 84 0 

agcggtttga ctcacgggga tttccaagtc tccaccccat tgacgtcaat gggagtttgt 900 

tttggcacca aaatcaacgg gactttccaa aatgtcgtaa taaccccgcc ccgttgacgc 960 

aaatgggcgg taggcgtgta cggtgggagg tctatataag cagagctcgt ttagtgaacc 1020 

gtcagatcgc ctggagacgc catccacgct gttttgacct ccatagaaga caccgggacc 10 80 

gatccagcct ccgcggccgg gaacggtgca ttggaacgcg gattccccgt gccaagagtg 114 0 

acgtaagtac cgcctataga ctctataggc acaccccttt ggctcttatg catgctatac 12 0 0 

tgtttttggc ttggggccta tacacccccg ctccttatgc tataggtgat ggtatagctt 12 60 

agcctatagg tgtgggttat tgaccattat tgaccactcc cctattggtg acgatacttt 1320 

ccattactaa tccataacat ggctctttgc cacaactatc tctattggct atatgccaat 13 8 0 

actctgtcct tcagagactg acacggactc tgtattttta caggatgggg tccatttatt 144 0 

atttacaaat tcacatatac aacaacgccg tcccccgtgc ccgcagtttt tattaaacat 1500 

agcgtgggat ctccgacatc tcgggtacgt gttccggaca tgggctcttc tccggtagcg 1560 

gcggagcttc cacatccgag ccctggtccc atccgtccag cggctcatgg tcgctcggca 162 0 

gctccttgct cctaacagtg gaggccagac ttaggcacag cacaatgccc accaccacca 1680 

gtgtgccgca caaggccgtg gcggtagggt atgtgtctga aaatgagctc ggagattggg 174 0 

ctcgcacctg gacgcagatg gaagacttaa ggcagcggca gaagaagatg caggcagctg 1800 

agttgttgta ttctgataag agtcagaggt aactcccgtt gcggtgctgt taacggtgga 1860 

gggcagtgta gtctgagcag tactcgttgc tgccgcgcgc gccaccagac ataatagctg 192 0 

acagactaac agactgttcc tttccatggg tcttttctgc agtcaccgtc gtcgacctaa 1980 

gaattcacc atg get gca tat gca get cag ggc tat aag gtg eta gta etc 2031 
Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu 
15 10 

aac ccc tct gtt get gca aca ctg ggc ttt ggt get tac atg tec aag 2 079 
Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys 
15 20 25 30 

get cat ggg ate gat cct aac ate agg acc ggg gtg aga aca att acc 212 7 
Ala His Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg Thr lie Thr 
35 40 45 

act ggc age ccc ate acg tac tec acc tac ggc aag ttc ctt gee gac 2175 
Thr Gly Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp 
50 55 60 

99 c 999 tgc tc 9 999 99° 9 ct tat 9 ac ata afca att tgt gac gag tgc 2 22 3 
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Gly Gly Cys Ser Gly Gly Ala Tyr Asp lie lie lie Cys Asp Glu Cys 
65 70 75 

cac tec acg gat gec aca tec ate ttg ggc att ggc act gtc ctt gac 2271 
His Ser Thr Asp Ala Thr Ser lie Leu Gly lie Gly Thr Val Leu Asp 
80 85 90 

caa gca gag act gcg ggg gcg aga ctg gtt gtg etc gee acc gee ace 2319 
Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr 
95 100 105 110 

cct ccg ggc tec gtc act gtg ccc cat ccc aac ate gag gag gtt get 2367 
Pro Pro Gly Ser Val Thr Val Pro His Pro Asn lie Glu Glu Val Ala 
115 120 125 

ctg tec acc acc gga gag ate cct ttt tac ggc aag get ate ccc etc 2415 
Leu Ser Thr Thr Gly Glu lie Pro Phe Tyr Gly Lys Ala lie Pro Leu 
130 135 140 

gaa gta ate aag ggg ggg aga cat etc ate ttc tgt cat tea aag aag 2463 
Glu Val lie Lys Gly Gly Arg His Leu lie Phe Cys His Ser Lys Lys 
145 150 155 

aag tgc gac gaa etc gee gca aag ctg gtc gca ttg ggc ate aat gee 2511 
Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly lie Asn Ala 
160 165 170 

gtg gee tac tac cgc ggt ctt gac gtg tec gtc ate ccg acc age ggc 2559 
Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val lie Pro Thr Ser Gly 
175 180 185 190 

gat gtt gtc gtc gtg gca acc gat gee etc atg acc ggc tat acc ggc 2607 
Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly 
195 200 205 

gac ttc gac teg gtg ata gac tgc aat acg tgt gtc acc cag aca gtc 2655 
Asp Phe Asp Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val 
210 215 220 

gat ttc age ctt gac cct acc ttc acc att gag aca ate acg etc ccc 2703 
Asp Phe Ser Leu Asp Pro Thr Phe Thr He Glu Thr He Thr Leu Pro 
225 230 235 

caa gat get gtc tec cgc act caa cgt egg ggc agg act ggc agg ggg 2751 
Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly 
240 245 250 

aag cca ggc ate tac aga ttt gtg gca ccg ggg gag cgc ccc tec ggc 2799 
Lys Pro Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly 
255 260 265 270 

atg ttc gac teg tec gtc etc tgt gag tgc tat gac gca ggc tgt get 2 847 
Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala 
275 280 285 

tgg tat gag etc acg ccc gee gag act aca gtt agg eta cga gcg tac 2 895 
Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr 
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290 295 300 

atg aac acc ccg ggg ctt ccc gtg tgc cag gac cat ctt gaa ttt tgg 2943 
Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp 
305 310 315 

9 a 9 99 c 9 tc ttt aca 99° ctc act cat ata 9 at 9 CC cac ttt cta tcc 2 991 
Glu Gly Val Phe Thr Gly Leu Thr His lie Asp Ala His Phe Leu Ser 
320 325 330 

cag aca aag cag agt ggg gag aac ctt cct tac ctg gta gcg tac caa 303 9 
Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin 
335 340 345 350 

gcc acc gtg tgc get agg get caa gec cct ccc cca teg tgg gac cag 3087 
Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin 
355 360 365 

atg tgg aag tgt ttg att cgc ctc aag ccc acc ctc cat ggg cca aca 3135 
Met Trp Lys Cys Leu lie Arg Leu Lys Pro Thr Leu His Gly Pro Thr 
370 375 380 

ccc ctg cta tac aga ctg ggc get gtt cag aat gaa ate acc ctg acg 3183 
Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu lie Thr Leu Thr 
385 390 395 

cac cca gtc acc aaa tac ate atg aca tgc atg teg gcc gac ctg gag 3231 
His Pro Val Thr Lys Tyr lie Met Thr Cys Met Ser Ala Asp Leu Glu 
400 405 410 

gtc gtc acg age acc tgg gtg ctc gtt ggc ggc gtc ctg get get ttg 3279 
Val Val Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu 
415 420 425 430 

gcc gcg tat tgc ctg tea aca ggc tgc gtg gtc ata gtg ggc agg gtc 332 7 
Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val lie Val Gly Arg Val 
435 440 445 

gtc ttg tcc ggg aag ccg gca ate ata cct gac agg gaa gtc ctc tac 3 375 
Val Leu Ser Gly Lys Pro Ala lie lie Pro Asp Arg Glu Val Leu Tyr 
450 455 460 

cga gag ttc gat gag atg gaa gag tgc tct cag cac tta ccg tac ate 3423 
Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr lie 
465 470 475 

gag caa ggg atg atg ctc gcc gag cag ttc aag cag aag gcc ctc ggc 3471 
Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly 
480 485 490 

ctc ctg cag acc gcg tcc cgt cag gca gag gtt ate gcc cct get gtc 3519 
Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val lie Ala Pro Ala Val 
495 500 505 510 

cag acc aac tgg caa aaa ctc gag acc ttc tgg gcg aag cat atg tgg 3567 
Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp 
515 520 525 
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aac ttc ate agt ggg ata caa tac ttg gcg ggc ttg tea acg ctg cct 3 615 
Asn Phe lie Ser Gly lie Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro 
530 535 540 

ggt aac ccc gec att get tea ttg atg get ttt aca get get gtc ace 3663 
Gly Asn Pro Ala lie Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr 
545 550 555 

age cca eta ace act age caa ace etc etc ttc aac ata ttg ggg ggg 3711 
Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn lie Leu Gly Gly 
560 565 570 

tgg gtg get gee cag etc gee gee ccc ggt gee get act gee ttt gtg 3759 
Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val 
575 580 585 590 

ggc get ggc tta get ggc gee gee ate ggc agt gtt gga ctg ggg aag 3 807 
Gly Ala Gly Leu Ala Gly Ala Ala lie Gly Ser Val Gly Leu Gly Lys 
595 600 605 

gtc etc ata gac ate ctt gca ggg tat ggc gcg ggc gtg gcg gga get 3 855 
Val Leu lie Asp lie Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala 
610 615 620 

ctt gtg gca ttc aag ate atg age ggt gag gtc ccc tec acg gag gac 3903 
Leu Val Ala Phe Lys lie Met Ser Gly Glu Val Pro Ser Thr Glu Asp 
625 630 635 

ctg gtc aat eta ctg ccc gee ate etc teg ccc gga gee etc gta gtc 3951 
Leu Val Asn Leu Leu Pro Ala lie Leu Ser Pro Gly Ala Leu Val Val 
640 645 650 

ggc gtg gtc tgt gca gca ata ctg cgc egg cac gtt ggc ccg ggc gag 3 999 
Gly Val Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu 
655 660 665 670 

ggg gca gtg cag tgg atg aac egg ctg ata gee ttc gee tec egg ggg 4 04 7 
Gly Ala Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly 
675 680 685 

aac cat gtt tec ccc acg cac tac gtg ccg gag age gat gca get gee 4095 
Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala 
690 695 700 

cgc gtc act gee ata etc age age etc act gta ace cag etc ctg agg 4143 
Arg Val Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg 
705 710 715 

cga ctg cac cag tgg ata age teg gag tgt ace act cca tgc tec ggt 4191 
Arg Leu His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly 
720 725 730 

tec tgg eta agg gac ate tgg gac tgg ata tgc gag gtg ttg age gac 4239 
Ser Trp Leu Arg Asp He Trp Asp Trp He Cys Glu Val Leu Ser Asp 
735 740 745 750 

ttt aag ace tgg eta aaa get aag etc atg cca cag ctg cct ggg ate 4287 
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Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly lie 
755 760 765 

ccc ttt gtg tec tgc cag cgc ggg tat aag ggg gtc tgg cga ggg gac 4335 
Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp 
770 775 780 

ggc ate atg cac act cgc tgc cac tgt gga get gag ate act gga cat 43 83 
Gly lie Met His Thr Arg Cys His Cys Gly Ala Glu lie Thr Gly His 
785 790 795 

gtc aaa aac ggg acg atg agg ate gtc ggt cct agg acc tgc agg aac 4431 
Val Lys Asn Gly Thr Met Arg lie Val Gly Pro Arg Thr Cys Arg Asn 
800 805 810 

atg tgg agt ggg acc ttc ccc att aat gee tac acc acg ggc ccc tgt 4479 
Met Trp Ser Gly Thr Phe Pro lie Asn Ala Tyr Thr Thr Gly Pro Cys 
815 820 825 830 

acc ccc ctt cct gcg ccg aac tac acg ttc gcg eta tgg agg gtg tct 4527 
Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser 
835 840 845 

gca gag gaa tac gtg gag ata agg cag gtg ggg gac ttc cac tac gtg 4575 
Ala Glu Glu Tyr Val Glu lie Arg Gin Val Gly Asp Phe His Tyr Val 
850 855 860 

acg ggt atg act act gac aat ctt aaa tgc ccg tgc cag gtc cca teg 4623 
Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser 
865 870 875 

ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc eta cat agg ttt gcg 4671 
Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala 
880 885 890 

ccc ccc tgc aag ccc ttg ctg egg gag gag gta tea ttc aga gta gga 4719 
Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly 
895 900 905 910 

etc cac gaa tac ccg gta ggg teg caa tta cct tgc gag ccc gaa ccg 4767 
Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro 
915 920 925 

gac gtg gee gtg ttg acg tec atg etc act gat ccc tec cat ata aca 4 815 
Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His lie Thr 
930 935 940 

gca gag gcg gee ggg cga agg ttg gcg agg gga tea ccc ccc tct gtg 48 63 
Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val 
945 950 955 

gee age tec teg get age cag eta tec get cca tct etc aag gca act 4911 
Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr 
960 965 970 

tgc acc get aac cat gac tec cct gat get gag etc ata gag gee aac 4959 
Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie Glu Ala Asn 
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975 980 985 990 

etc eta tgg agg cag gag atg ggc ggc aac ate acc agg gtt gag tea 5007 
Leu Leu Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg Val Glu Ser 
995 1000 1005 

gaa aac aaa gtg gtg att ctg gac tec ttc gat ccg ctt gtg gcg gag 5055 
Glu Asn Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu Val Ala Glu 
1010 1015 1020 

gag gac gag egg gag ate tec gta ccc gca gaa ate ctg egg aag tct 5103 
Glu Asp Glu Arg Glu lie Ser Val Pro Ala Glu lie Leu Arg Lys Ser 
1025 1030 1035 

egg aga ttc gee cag gee ctg ccc gtt tgg gcg egg ccg gac tat aac 5151 
Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn 
1040 1045 1050 

ccc ccg eta gtg gag acg tgg aaa aag ccc gac tac gaa cca cct gtg 5199 
Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val 
1055 1060 1065 1070 

gtc cat ggc tgc ccg ctt cca cct cca aag tec cct cct gtg cct ccg 5247 
Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro 
1075 1080 1085 

cct egg aag aag egg acg gtg gtc etc act gaa tea acc eta tct act 52 95 
Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr 
1090 1095 1100 

gee ttg gee gag etc gec acc aga age ttt ggc age tec tea act tec 5343 
Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser 
1105 1110 1115 

ggc att acg ggc gac aat acg aca aca tec tct gag ccc gee cct tct 5391 
Gly lie Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser 
1120 1125 1130 

ggc tgc ccc ccc gac tec gac get gag tec tat tec tec atg ccc ccc 5439 
Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro 
1135 1140 1145 1150 

ctg gag ggg gag cct ggg gat ccg gat ctt age gac ggg tea tgg tea 54 87 
Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser 
1155 1160 1165 

acg gtc agt agt gag gee aac gcg gag gat gtc gtg tgc tgc tea atg 5535 
Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met 
1170 1175 1180 

tct tac tct tgg aca ggc gca etc gtc acc ccg tgc gee gcg gaa gaa 5583 
Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu 
1185 1190 1195 

cag aaa ctg ccc ate aat gca eta age aac teg ttg eta cgt cac cac 5631 
Gin Lys Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu Arg His His 
1200 1205 1210 
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aat ttg gtg tat tec acc acc tea cgc agt get tgc caa agg cag aag 5679 
Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys 
1215 1220 1225 1230 

aaa gtc aca ttt gac aga ctg caa gtt ctg gac age cat tac cag gac 5727 
Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp 
1235 1240 1245 

gta etc aag gag gtt aaa gca gcg gcg tea aaa gtg aag get aac ttg 5775 
Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu 
1250 1255 1260 

eta tec gta gag gaa get tgc age ctg acg ccc cca cac tea gec aaa 5823 
Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys 
1265 1270 1275 

tec aag ttt ggt tat ggg gca aaa gac gtc cgt tgc cat gee aga aag 5871 
Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys 
1280 1285 1290 

gec gta acc cac ate aac tec gtg tgg aaa gac ctt ctg gaa gac aat 5919 
Ala Val Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn 
1295 1300 1305 1310 

gta aca cca ata gac act acc ate atg get aag aac gag gtt ttc tgc 5 967 
Val Thr Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu Val Phe Cys 
1315 1320 1325 

gtt cag cct gag aag ggg ggt cgt aag cca get cgt etc ate gtg ttc 6015 
Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu He Val Phe 
1330 1335 1340 

ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg get ttg tac gac gtg 6063 
Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val 
1345 1350 1355 

gtt aca aag etc ccc ttg gee gtg atg gga age tec tac gga ttc caa 6111 
Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin 
1360 1365 1370 

tac tea cca gga cag egg gtt gaa ttc etc gtg caa gcg tgg aag tec 6159 
Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser 
1375 1380 1385 1390 

aag aaa acc cca atg ggg ttc teg tat gat acc cgc tgc ttt gac tec 62 07 
Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser 
1395 1400 1405 

aca gtc act gag age gac ate cgt acg gag gag gca ate tac caa tgt 6255 
Thr Val Thr Glu Ser Asp He Arg Thr Glu Glu Ala He Tyr Gin Cys 
1410 1415 1420 

tgt gac etc gac ccc caa gee cgc gtg gee ate aag tec etc acc gag 63 03 
Cys Asp Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser Leu Thr Glu 
1425 1430 1435 

agg ctt tat gtt ggg ggc cct ctt acc aat tea agg ggg gag aac tgc 63 51 
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Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys 
1440 1445 1450 

ggc tat cgc agg tgc cgc gcg age ggc gta ctg aca act age tgt ggt 63 99 
Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly 
1455 1460 1465 1470 

aac acc etc act tgc tac ate aag gec egg gca gec tgt cga gec gca 6447 
Asn Thr Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys Arg Ala Ala 
1475 1480 1485 

ggg etc cag gac tgc acc atg etc gtg tgt ggc gac gac tta gtc gtt 64 95 
Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val 
1490 1495 1500 

ate tgt gaa age gcg ggg gtc cag gag gac gcg gcg age ctg aga gec 6543 
lie Cys Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala 
1505 1510 1515 

ttc acg gag get atg acc agg tac tec gee ccc cct ggg gac ccc cca 6591 
Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro 
1520 1525 1530 

caa cca gaa tac gac ttg gag etc ata aca tea tgc tec tec aac gtg 663 9 
Gin Pro Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser Ser Asn Val 
1535 1540 1545 1550 

tea gtc gec cac gac ggc get gga aag agg gtc tac tac etc acc cgt 6687 
Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg 
1555 1560 1565 

gac cct aca acc ccc etc gcg aga get gcg tgg gag aca gca aga cac 6735 
Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His 
1570 1575 1580 

act cca gtc aat tec tgg eta ggc aac ata ate atg ttt gee ccc aca 6783 
Thr Pro Val Asn Ser Trp Leu Gly Asn lie lie Met Phe Ala Pro Thr 
1585 1590 1595 

ctg tgg gcg agg atg ata ctg atg acc cat ttc ttt age gtc ctt ata 6831 
Leu Trp Ala Arg Met lie Leu Met Thr His Phe Phe Ser Val Leu lie 
1600 1605 1610 

gec agg gac cag ctt gaa cag gec etc gat tgc gag ate tac ggg gee 6879 
Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie Tyr Gly Ala 
1615 1620 1625 1630 

tgc tac tec ata gaa cca ctg gat eta cct cca ate att caa aga etc 6927 
Cys Tyr Ser lie Glu Pro Leu Asp Leu Pro Pro lie lie Gin Arg Leu 
1635 1640 1645 

cat ggc etc age gca ttt tea etc cac agt tac tct cca ggt gaa ate 6975 
His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu lie 
1650 1655 1660 

aat agg gtg gec gca tgc etc aga aaa ctt ggg gta ccg ccc ttg cga 7023 
Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg 
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1665 1670 1675 

get tgg aga cac egg gec egg age gtc cgc get agg ctt ctg gec aga 7071 
Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg 
1680 1685 1690 

gga ggc agg get gee ata tgt ggc aag tac etc ttc aac tgg gca gta 7119 
Gly Gly Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val 
1695 1700 1705 1710 

aga aca aag etc aaa etc act cca ata gcg gec get ggc cag ctg gac 7167 
Arg Thr Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly Gin Leu Asp 
1715 1720 1725 

ttg tec ggc tgg ttc acg get ggc tac age ggg gga gac att tat cac 7215 
Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp lie Tyr His 
1730 1735 1740 

age gtg tct cat gee egg ccc cgc tgg ate tgg ttt tgc eta etc ctg 72 63 
Ser Val Ser His Ala Arg Pro Arg Trp lie Trp Phe Cys Leu Leu Leu 
1745 1750 1755 



ctt get gca ggg gta ggc ate tac etc etc ccc aac cga tgaaggttgg 7312 
Leu Ala Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg 



1760 




1765 


1770 






y y L- ct cj. cl • ci i_. 


ee ci erect as a 


aaaaaaaaaa 


aatctagaaa 


ggcgcgccaa 


gatatcaagg 


7372 


atccactacg 


cgttagagct 


cgctgatcag 


cctcgactgt 


gecttctagt 


tgccagccat 


7432 


ctgttgtttg 


cccctccccc 


gtgccttcct 


tgaccctgga 


aggtgecact 


cccactgtcc 


7492 


tttcctaata 


aaatgaggaa 


attgeatege 


attgtctgag 


taggtgtcat 


tctattctgg 


7552 


ggggtggggt 


ggggcaggac 


agcaaggggg 


aggattggga 


agacaatagc 


aggcatgetg 


7612 


gggagctctt 


ccgcttcctc 


gctcactgac 


tcgctgcgct 


eggtegtteg 


getgeggega 


7672 


gcggtatcag 


ctcactcaaa 


ggeggtaata 


eggttatcca 


cagaatcagg 


ggataacgea 


7732 


ggaaagaaca 


tgtgagcaaa 


aggecagcaa 


aaggccagga 


acegtaaaaa 


ggccgcgttg 


7792 


ctggcgtttt 


tccataggct 


ccgcccccct 


gacgagcatc 


acaaaaatcg 


aegctcaagt 


7852 


cagaggtggc 


gaaacccgac 


aggactataa 


agataccagg 


cgtttccccc 


tggaagctcc 


7912 


ctcgtgcgct 


ctcctgttcc 


gaccctgccg 


ettaceggat 


acctgtccgc 


ctttctccct 


7972 


tegggaageg 


tggegcttte 


teaatgetea 


cgctgtaggt 


atctcagttc 


ggtgtaggtc 


8032 


gttcgctcca 


agctgggctg 


tgtgcacgaa 


ccccccgttc 


agcccgaccg 


ctgcgcctta 


8092 


teeggtaact 


ategtcttga 


gtccaacccg 


gtaagacacg 


acttatcgcc 


actggcagca 


8152 


gccactggta 


acaggattag 


cagagegagg 


tatgtaggcg 


gtgetacaga 


gttcttgaag 


8212 


tggtggccta 


actaeggcta 


cactagaagg 


acagtatttg 


gtatctgege 


tetgetgaag 


8272 
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ccagttacct tcggaaaaag agttggtagc 
agcggtggtt tttttgtttg caagcagcag 
gatcctttga tcttttctac ggggtctgac 
attttggtca tgagattatc aaaaaggatc 
agttttaaat caatctaaag tatatatgag 
atcagtgagg cacctatctc agcgatctgt 
cccgtcgtgt agataactac gatacgggag 
ataccgcgag acccacgctc accggctcca 
agggccgagc gcagaagtgg tcctgcaact 
tgccgggaag ctagagtaag tagttcgcca 
gctacaggca tcgtggtgtc acgctcgtcg 
caacgatcaa ggcgagttac atgatccccc 
ggtcctccga tcgttgtcag aagtaagttg 
gcactgcata attctcttac tgtcatgcca 
tactcaacca agtcattctg agaatagtgt 
tcaatacggg ataataccgc gccacatagc 
cgttcttcgg ggcgaaaact ctcaaggatc 
cccactcgtg cacccaactg atcttcagca 
gcaaaaacag gaaggcaaaa tgccgcaaaa 
atactcatac tcttcctttt tcaatattat 
agcggataca tatttgaatg tatttagaaa 
ccccgaaaag tgccacctga cgtctaagaa 
aataggcgta tcacgaggcc ctttcgtc 

<210> 2 
<211> 1771 
<212> PRT 

<213> Hepatitis C virus 
<220> 

<223> Description of Artificial 
<400> 2 

Met Ala Ala Tyr Ala Ala Gin Gly 




tcttgatccg gcaaacaaac caccgctggt 8332 
attacgcgca gaaaaaaagg atctcaagaa 8392 
gctcagtgga acgaaaactc acgttaaggg 8452 
ttcacctaga tccttttaaa ttaaaaatga 8512 
taaacttggt ctgacagtta ccaatgctta 8572 
ctatttcgtt catccatagt tgcctgactc 8632 
ggcttaccat ctggccccag tgctgcaatg 8692 
gatttatcag caataaacca gccagccgga 8752 
ttatccgcct ccatccagtc tattaattgt 8812 
gttaatagtt tgcgcaacgt tgttgccatt 8872 
tttggtatgg cttcattcag ctccggttcc 8932 
atgttgtgca aaaaagcggt tagctccttc 8 992 
gccgcagtgt tatcactcat ggttatggca 9052 
tccgtaagat gcttttctgt gactggtgag 9112 
atgcggcgac cgagttgctc ttgcccggcg 9172 
agaactttaa aagtgctcat cattggaaaa 9232 
ttaccgctgt tgagatccag ttcgatgtaa 9292 
tcttttactt tcaccagcgt ttctgggtga 9352 
aagggaataa gggcgacacg gaaatgttga 9412 
tgaagcattt atcagggtta ttgtctcatg 9472 
aataaacaaa taggggttcc gcgcacattt 9532 
accattatta tcatgacatt aacctataaa 9592 

9620 



Sequence: Hepatitis C pns345 
Tyr Lys Val Leu Val Leu Asn Pro 
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15 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg Thr lie Thr Thr Gly 
35 40 45 

Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp lie lie lie Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser lie Leu Gly He Gly Thr Val Leu Asp Gin Ala 
85 90 95 

Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn He Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu Glu Val 
130 135 140 

He Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 
145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly He Asn Ala Val Ala 
165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr He Glu Thr He Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 
245 250 255 

Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 
260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
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305 310 315 320 

Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu He Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He Thr Leu Thr His Pro 
385 390 395 400 

Val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala He He Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 525 

He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He Leu Gly Gly Trp Val 
565 570 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
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610 



615 



620 



Ala Phe Lys lie Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala lie Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala lie Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu lie Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala lie Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 

His Gin Trp lie Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp lie Trp Asp Trp lie Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly lie Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly lie 
770 775 780 

Met His Thr Arg Cys His Cys Gly Ala Glu lie Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg lie Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro lie Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 

Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 

Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 



Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 



900 



905 



910 
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915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His lie Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu lie Ser Val Pro Ala Glu lie Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly lie 
105 1110 1115 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 

Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
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1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 

Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu lie Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val He Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
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1525 1530 1535 

Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn lie lie Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 

Ala Arg Met lie Leu Met Thr His Phe Phe Ser Val Leu lie Ala Arg 
1605 1610 1615 

Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser lie Glu Pro Leu Asp Leu Pro Pro lie He Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu He Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 1680 

Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 

Arg Ala Ala He Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp He Tyr His Ser Val 
1730 1735 1740 

Ser His Ala Arg Pro Arg Trp He Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 

Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg 
1765 1770 



<210> 3 

<211> 9620 

<212> DNA 

<213> Artificial Sequence 

<220> 
<221> CDS 

<222> (1990) . . (7302) 
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<220> 

<223> Description of Artificial 
<400> 3 



cgcgcgtttc 


ggtgatgacg 


gtgaaaacct 


agcttgtctg 


taagcggatg 


ccgggagcag 


tggcgggtgt 


cggggctggc 


ttaactatgc 


ccatatgaag 


ctttttgcaa 


aagcctaggc 


atagctcaga 


ggccgaggcg 


gcctcggcct 


ggggcggaga 


atgggcggaa 


ctgggcgggg 


cgttgtatct 


atatcataat 


atgtacattt 


gttgacattg 


attattgact 


agttattaat 


gcccatatat 


ggagttccgc 


gttacataac 


ccaacgaccc 


ccgcccattg 


acgtcaataa 


ggactttcca 


ttgacgtcaa 


tgggtggagt 


atcaagtgta 


tcatatgcca 


agtccgcccc 


cctggcatta 


tgcccagtac 


atgaccttac 


tattagtcat 


cgctattacc 


atggtgatgc 


agcggtttga 


ctcacgggga 


tttccaagtc 


tttggcacca 


aaatcaacgg 


gactttccaa 


aaatgggcgg 


taggcgtgta 


cggtgggagg 


gtcagatcgc 


ctggagacgc 


catccacgct 


gatccagcct 


ccgcggccgg 


gaacggtgca 


acgtaagtac 


cgcctataga 


ctctataggc 


tgtttttggc 


ttggggccta 


tacacccccg 


agcctatagg 


tgtgggttat 


tgaccattat 


ccattactaa 


tccataacat 


ggctctttgc 


actctgtcct 


tcagagactg 


acacggactc 


atttacaaat 


tcacatatac 


aacaacgccg 


agcgtgggat 


ctccgacatc 


tcgggtacgt 


gcggagcttc 


cacatccgag 


ccctggtccc 



Sequence: pDeltaNS3NS5 

ctgacacatg cagctcccgg agacggtcac 60 
acaagcccgt cagggcgcgt cagcgggtgt 12 0 
ggcatcagag cagattgtac tgagagtgca 180 
ctccaaaaaa gcctcctcac tacttctgga 240 
ctgcataaat aaaaaaaatt agtcagccat 300 
agggaattat tggctattgg ccattgcata 3 60 
atattggctc atgtccaata tgaccgccat 42 0 
agtaatcaat tacggggtca ttagttcata 480 
ttacggtaaa tggcccgcct ggctgaccgc 540 
tgacgtatgt tcccatagta acgccaatag 600 
atttacggta aactgcccac ttggcagtac 660 
ctattgacgt caatgacggt aaatggcccg 72 0 
gggactttcc tacttggcag tacatctacg 780 
ggttttggca gtacaccaat gggcgtggat 840 
tccaccccat tgacgtcaat gggagtttgt 900 
aatgtcgtaa taaccccgcc ccgttgacgc 960 
tctatataag cagagctcgt ttagtgaacc 1020 
gttttgacct ccatagaaga caccgggacc 1080 
ttggaacgcg gattccccgt gccaagagtg 114 0 
acaccccttt ggctcttatg catgctatac 1200 
ctccttatgc tataggtgat ggtatagctt 12 60 
tgaccactcc cctattggtg acgatacttt 132 0 
cacaactatc tctattggct atatgccaat 1380 
tgtattttta caggatgggg tccatttatt 144 0 
tcccccgtgc ccgcagtttt tattaaacat 1500 
gttccggaca tgggctcttc tccggtagcg 1560 
atccgtccag cggctcatgg tcgctcggca 162 0 
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gctccttgct cctaacagtg gaggccagac ttaggcacag cacaatgccc accaccacca 1680 

gtgtgccgca caaggccgtg gcggtagggt atgtgtctga aaatgagctc ggagattggg 174 0 

ctcgcacctg gacgcagatg gaagacttaa ggcagcggca gaagaagatg caggcagctg 1800 

agttgttgta ttctgataag agtcagaggt aactcccgtt gcggtgctgt taacggtgga 1860 

gggcagtgta gtctgagcag tactcgttgc tgccgcgcgc gccaccagac ataatagctg 192 0 

acagactaac agactgttcc tttccatggg tcttttctgc agtcaccgtc gtcgacctaa 1980 

gaattcacc atg get gca tat gca get cag ggc tat aag gtg eta gta etc 2031 
Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu 
15 10 

aac ccc tct gtt get gca aca ctg ggc ttt ggt get tac atg tec aag 2079 
Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys 

15 20 25 30 

get cat ggg ate gat cct aac ate agg ace ggg gtg aga aca att acc 2127 
Ala His Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg Thr lie Thr 
35 40 45 

act ggc age ccc ate acg tac tec acc tac ggc aag ttc ctt gec gac 2175 
Thr Gly Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp 
50 55 60 

ggc ggg tgc teg ggg ggc get tat gac at a at a att tgt gac gag tgc 2223 
Gly Gly Cys Ser Gly Gly Ala Tyr Asp He He He Cys Asp Glu Cys 
65 70 75 

cac tec acg gat gec aca tec ate ttg ggc att ggc act gtc ctt gac 2271 
His Ser Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp 
80 85 90 

caa gca gag act gcg ggg gcg aga ctg gtt gtg etc gee acc gee acc 2319 
Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr 
95 100 105 110 

cct ccg ggc tec gtc act gtg ccc cat ccc aac ate gag gag gtt get 23 67 
Pro Pro Gly Ser Val Thr Val Pro His Pro Asn He Glu Glu Val Ala 
115 120 125 

ctg tec acc acc gga gag ate cct ttt tac ggc aag get ate ccc etc 2415 
Leu Ser Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu 
130 135 140 

gaa gta ate aag ggg ggg aga cat etc ate ttc tgt cat tea aag aag 2463 
Glu Val He Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys 
145 150 155 

aag tgc gac gaa etc gec gca aag ctg gtc gca ttg ggc ate aat gec 2511 
Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly He Asn Ala 
160 165 170 

gtg gec tac tac cgc ggt ctt gac gtg tec gtc ate ccg acc age ggc 2559 
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Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val lie Pro Thr Ser Gly 
175 180 185 190 

gat gtt gtc gtc gtg gca acc gat gcc etc atg acc ggc tat acc ggc 2607 
Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly 
195 200 205 

gac ttc gac teg gtg ata gac tgc aat acg tgt gtc acc cag aca gtc 2 655 
Asp Phe Asp Ser Val lie Asp Cys Asn Thr Cys Val Thr Gin Thr Val 
210 215 220 

gat ttc age ctt gac cct acc ttc acc att gag aca ate acg etc ccc 2703 
Asp Phe Ser Leu Asp Pro Thr Phe Thr lie Glu Thr lie Thr Leu Pro 
225 230 235 

caa gat get gtc tec cgc act caa cgt egg ggc agg act ggc agg ggg 2751 
Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly 
240 245 250 

aag cca ggc ate tac aga ttt gtg gca ccg ggg gag cgc ccc tec ggc 2799 
Lys Pro Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly 
255 260 265 270 

atg ttc gac teg tec gtc etc tgt gag tgc tat gac gca ggc tgt get 2847 
Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala 
275 280 285 

tgg tat gag etc acg ccc gcc gag act aca gtt agg eta cga gcg tac 2895 
Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr 
290 295 300 

atg aac acc ccg ggg ctt ccc gtg tgc cag gac cat ctt gaa ttt tgg 2 943 
Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp 
305 310 315 

gag ggc gtc ttt aca ggc etc act cat ata gat gcc cac ttt eta tec 2991 
Glu Gly Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser 
320 325 330 

cag aca aag cag agt ggg gag aac ctt cct tac ctg gta gcg tac caa 3 03 9 
Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin 
335 340 345 350 

gcc acc gtg tgc get agg get caa gcc cct ccc cca teg tgg gac cag 3087 
Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin 
355 360 365 

a tg tgg aag tgt ttg att cgc etc aag ccc acc etc cat ggg cca aca 3135 
Met Trp Lys Cys Leu He Arg Leu Lys Pro Thr Leu His Gly Pro Thr 
370 375 380 

ccc ctg eta tac aga ctg ggc get gtt cag aat gaa ate acc ctg acg 3183 
Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He Thr Leu Thr 
385 390 395 

cac cca gtc acc aaa tac ate atg aca tgc atg teg gcc gac ctg gag 3231 
His Pro Val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu 
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400 405 410 

gtc gtc acg age acc tgg gtg etc gtt ggc ggc gtc ctg get get ttg 3279 

Val Val Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu 
415 420 425 430 

gec gcg tat tgc ctg tea aca ggc tgc gtg gtc ata gtg ggc agg gtc 3327 

Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val lie Val Gly Arg Val 
435 440 445 

gtc ttg tec ggg aag ccg gca ate ata cct gac agg gaa gtc etc tac 3375 

Val Leu Ser Gly Lys Pro Ala lie lie Pro Asp Arg Glu Val Leu Tyr 

450 455 460 

cga gag ttc gat gag atg gaa gag tgc tct cag cac tta ccg tac ate 3423 

Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr lie 
465 470 475 

gag caa ggg atg atg etc gec gag cag ttc aag cag aag gec etc ggc 3471 

Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly 
480 485 490 

etc ctg cag acc gcg tec cgt cag gca gag gtt ate gec cct get gtc 3519 

Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val lie Ala Pro Ala Val 
495 500 505 510 

cag acc aac tgg caa aaa etc gag acc ttc tgg gcg aag cat atg tgg 3 567 

Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp 
515 520 525 

aac ttc ate agt ggg ata caa tac ttg gcg ggc ttg tea acg ctg cct 3615 

Asn Phe lie Ser Gly lie Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro 

530 535 540 

ggt aac ccc gee att get tea ttg atg get ttt aca get get gtc acc 3663 

Gly Asn Pro Ala lie Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr 
545 550 555 

age cca eta acc act age caa acc etc etc ttc aac ata ttg ggg ggg 3711 

Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn lie Leu Gly Gly 
560 565 570 

tgg gtg get gec cag etc gec gec ccc ggt gee get act gec ttt gtg 3 75 9 

Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val 
575 580 585 590 

ggc get ggc tta get ggc gee gec ate ggc agt gtt gga ctg ggg aag 3 807 

Gly Ala Gly Leu Ala Gly Ala Ala lie Gly Ser Val Gly Leu Gly Lys 
595 600 605 

gtc etc ata gac ate ctt gca ggg tat ggc gcg ggc gtg gcg gga get 3 855 
Val Leu lie Asp lie Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala 

610 615 620 

ctt gtg gca ttc aag ate atg age ggt gag gtc ccc tec acg gag gac 3 903 

Leu Val Ala Phe Lys lie Met Ser Gly Glu Val Pro Ser Thr Glu Asp 
625 630 635 
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ctg gtc aat eta ctg ccc gec ate etc teg ccc gga gee etc gta gtc 3 951 
Leu Val Asn Leu Leu Pro Ala lie Leu Ser Pro Gly Ala Leu Val Val 
640 645 650 

ggc gtg gtc tgt gca gca ata ctg cgc egg cac gtt ggc ccg ggc gag 3 999 
Gly Val Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu 
655 660 665 670 

ggg gca gtg cag tgg atg aac egg ctg ata gee ttc gec tec egg ggg 4047 
Gly Ala Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly 
675 680 685 

aac cat gtt tec ccc acg cac tac gtg ccg gag age gat gca get gec 4095 
Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala 
690 695 700 

cgc gtc act gec ata etc age age etc act gta ace cag etc ctg agg 4143 
Arg Val Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg 
705 710 715 

cga ctg cac cag tgg ata age teg gag tgt ace act cca tgc tec ggt 4191 
Arg Leu His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly 
720 725 730 

tec tgg eta agg gac ate tgg gac tgg ata tgc gag gtg ttg age gac 4239 
Ser Trp Leu Arg Asp He Trp Asp Trp He Cys Glu Val Leu Ser Asp 
735 740 745 750 

ttt aag ace tgg eta aaa get aag etc atg cca cag ctg cct ggg ate 4287 
Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He 
755 760 765 

ccc ttt gtg tec tgc cag cgc ggg tat aag ggg gtc tgg cga ggg gac 4335 
Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp 
770 775 780 

ggc ate atg cac act cgc tgc cac tgt gga get gag ate act gga cat 4383 
Gly He Met His Thr Arg Cys His Cys Gly Ala Glu He Thr Gly His 
785 790 795 

gtc aaa aac ggg acg atg agg ate gtc ggt cct agg ace tgc agg aac 4431 
Val Lys Asn Gly Thr Met Arg He Val Gly Pro Arg Thr Cys Arg Asn 
800 805 810 

at 9 tgg agt ggg ace ttc ccc att aat gee tac ace acg ggc ccc tgt 4479 
Met Trp Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr Gly Pro Cys 
815 820 825 830 

acc ccc ctt cct gcg ccg aac tac acg ttc gcg eta tgg agg gtg tct 4527 
Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser 
835 840 845 

gca gag gaa tac gtg gag ata agg cag gtg ggg gac ttc cac tac gtg 4575 
Ala Glu Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val 
850 855 860 

acg ggt atg act act gac aat ctt aaa tgc ccg tgc cag gtc cca teg 4623 
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Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser 
865 870 875 

ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc eta cat agg ttt gcg 4671 
Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala 
880 885 890 

ccc ccc tgc aag ccc ttg ctg egg gag gag gta tea ttc aga gta gga 4719 
Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly 
895 900 905 910 

etc cac gaa tac ccg gta ggg teg caa tta cct tgc gag ccc gaa ccg 4767 
Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro 
915 920 925 

gac gtg gec gtg ttg acg tec atg etc act gat ccc tec cat ata aca 4815 
Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His lie Thr 
930 935 940 

gca gag gcg gee ggg cga agg ttg gcg agg gga tea ccc ccc tct gtg 4863 
Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val 
945 950 955 

gee age tec teg get age cag eta tec get cca tct etc aag gca act 4911 
Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr 
960 965 970 

tgc ace get aac cat gac tec cct gat get gag etc ata gag gee aac 4 959 
Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie Glu Ala Asn 
975 980 985 990 

etc eta tgg agg cag gag atg ggc ggc aac ate ace agg gtt gag tea 50 07 
Leu Leu Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg Val Glu Ser 
995 1000 1005 

gaa aac aaa gtg gtg att ctg gac tec ttc gat ccg ctt gtg gcg gag 5055 
Glu Asn Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu Val Ala Glu 
1010 1015 1020 

gag gac gag egg gag ate tec gta ccc gca gaa ate ctg egg aag tct 5103 
Glu Asp Glu Arg Glu lie Ser Val Pro Ala Glu lie Leu Arg Lys Ser 
1025 1030 1035 

egg aga ttc gee cag gee ctg ccc gtt tgg gcg egg ccg gac tat aac 5151 
Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn 
1040 1045 1050 

ccc ccg eta gtg gag acg tgg aaa aag ccc gac tac gaa cca cct gtg 5199 
Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val 
1055 1060 1065 1070 

gtc cat ggc tgc ccg ctt cca cct cca aag tec cct cct gtg cct ccg 5247 
Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro 
1075 1080 1085 

cct egg aag aag egg acg gtg gtc etc act gaa tea ace eta tct act 5295 
Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr 
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• 



1090 1095 1100 

gcc ttg gcc gag etc gec acc aga age ttt ggc age tec tea act tec 5343 
Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser 
1105 1110 1115 

ggc att acg ggc gac aat acg aca aca tec tct gag ccc gcc cct tct 5391 
Gly lie Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser 
1120 1125 1130 

99 c tgc ccc ccc gac tec gac get gag tec tat tec tec atg ccc ccc 5439 
Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro 
1135 1140 1145 1150 

ctg gag ggg gag cct ggg gat ccg gat ctt age gac ggg tea tgg tea 5487 
Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser 
1155 1160 1165 

acg gtc agt agt gag gcc aac gcg gag gat gtc gtg tgc tgc tea atg 553 5 
Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met 
1170 1175 1180 

tct tac tct tgg aca ggc gca etc gtc acc ccg tgc gcc gcg gaa gaa 5583 
Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu 
1185 1190 1195 

cag aaa ctg ccc ate aat gca eta age aac teg ttg eta cgt cac cac 5631 
Gin Lys Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu Arg His His 
1200 1205 1210 

aat ttg gtg tat tec acc acc tea cgc agt get tgc caa agg cag aag 5679 
Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys 
1215 1220 1225 1230 

aaa gtc aca ttt gac aga ctg caa gtt ctg gac age cat tac cag gac 5727 
Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp 
1235 1240 1245 

gta etc aag gag gtt aaa gca gcg gcg tea aaa gtg aag get aac ttg 5775 
Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu 
1250 1255 1260 

eta tec gta gag gaa get tgc age ctg acg ccc cca cac tea gcc aaa 5823 
Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys 
1265 1270 1275 

tec aag ttt ggt tat ggg gca aaa gac gtc cgt tgc cat gcc aga aag 5 871 
Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys 
1280 1285 1290 

gcc gta acc cac ate aac tec gtg tgg aaa gac ctt ctg gaa gac aat 5919 
Ala Val Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn 
1295 1300 1305 1310 

gta aca cca ata gac act acc ate atg get aag aac gag gtt ttc tgc 5967 
Val Thr Pro lie Asp Thr Thr He Met Ala Lys Asn Glu Val Phe Cys 
1315 1320 1325 
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gtt cag cct gag aag ggg ggt cgt aag cca get cgt etc ate gtg ttc 6015 

Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu lie Val Phe 
1330 1335 1340 

ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg get ttg tac gac gtg 6063 

Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val 
1345 1350 1355 

gtt aca aag etc ccc ttg gee gtg atg gga age tec tac gga ttc caa 6111 

Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin 
1360 1365 1370 

tac tea cca gga cag egg gtt gaa ttc etc gtg caa gcg tgg aag tec 615 9 

Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser 
1375 1380 1385 1390 

aag aaa acc cca atg ggg ttc teg tat gat acc cgc tgc ttt gac tec 62 07 

Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser 
1395 1400 1405 

aca gtc act gag age gac ate cgt acg gag gag gca ate tac caa tgt 6255 

Thr Val Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie Tyr Gin Cys 
1410 1415 1420 

tgt gac etc gac ccc caa gee cgc gtg gee ate aag tec etc acc gag 63 03 

Cys Asp Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser Leu Thr Glu 
1425 1430 1435 

agg ctt tat gtt ggg ggc cct ctt acc aat tea agg ggg gag aac tgc 63 51 

Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys 
1440 1445 1450 

ggc tat cgc agg tgc cgc gcg age ggc gta ctg aca act age tgt ggt 63 99 

Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly 
1455 1460 1465 1470 

aac acc etc act tgc tac ate aag gec egg gca gec tgt cga gec gca 6447 

Asn Thr Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys Arg Ala Ala 
1475 1480 1485 

ggg etc cag gac tgc acc atg etc gtg tgt ggc gac gac tta gtc gtt 6495 

Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val 
1490 1495 1500 

ate tgt gaa age gcg ggg gtc cag gag gac gcg gcg age ctg aga gee 6543 

He Cys Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala 
1505 1510 1515 

ttc acg gag get atg acc agg tac tec gee ccc cct ggg gac ccc cca 6591 

Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro 
1520 1525 1530 

caa cca gaa tac gac ttg gag etc ata aca tea tgc tec tec aac gtg 663 9 

Gin Pro Glu Tyr Asp Leu Glu Leu He Thr Ser Cys Ser Ser Asn Val 
1535 1540 1545 1550 

tea gtc gee cac gac ggc get gga aag agg gtc tac tac etc acc cgt 6687 
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Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg 
1555 1560 1565 

gac cct aca acc ccc etc gcg aga get gcg tgg gag aca gca aga cac 6735 
Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His 
1570 1575 1580 

act cca gtc aat tec tgg eta ggc aac ata ate atg ttt gee ccc aca 6783 
Thr Pro Val Asn Ser Trp Leu Gly Asn lie lie Met Phe Ala Pro Thr 
1585 1590 1595 

ctg tgg gcg agg atg ata ctg atg acc cat ttc ttt age gtc ctt ata 6831 
Leu Trp Ala Arg Met lie Leu Met Thr His Phe Phe Ser Val Leu lie 
1600 1605 1610 

gec agg gac cag ctt gaa cag gec etc gat tgc gag ate tac ggg gec 6879 
Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie Tyr Gly Ala 
1615 1620 1625 1630 

tgc tac tec ata gaa cca ctg gat eta cct cca ate att caa aga etc 6927 
Cys Tyr Ser lie Glu Pro Leu Asp Leu Pro Pro lie lie Gin Arg Leu 
1635 1640 1645 

cat ggc etc age gca ttt tea etc cac agt tac tct cca ggt gaa ate 6975 
His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu lie 
1650 1655 1660 

aat agg gtg gec gca tgc etc aga aaa ctt ggg gta ccg ccc ttg cga 7023 
Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg 
1665 1670 1675 

get tgg aga cac egg gee egg age gtc cgc get agg ctt ctg gee aga 7071 
Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg 
1680 1685 1690 

39 a ggc agg get gee ata tgt ggc aag tac etc ttc aac tgg gca gta 7119 
Gly Gly Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val 
1695 1700 1705 1710 

aga aca aag etc aaa etc act cca ata gcg gee get ggc cag ctg gac 7167 
Arg Thr Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly Gin Leu Asp 
1715 1720 1725 

ttg tec ggc tgg ttc acg get ggc tac age ggg gga gac att tat cac 7215 
Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp He Tyr His 
1730 1735 1740 

age gtg tct cat gee egg ccc cgc tgg ate tgg ttt tgc eta etc ctg 7263 
Ser Val Ser His Ala Arg Pro Arg Trp He Trp Phe Cys Leu Leu Leu 
1745 1750 1755 

ctt get gca ggg gta ggc ate tac etc etc ccc aac cga tgaaggttgg 7312 
Leu Ala Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg 
1760 1765 1770 

ggtaaacact ccggcctaaa aaaaaaaaaa aatctagaaa ggcgcgccaa gatatcaagg 7372 



26 




atccactacg cgttagagct cgctgatcag cctcgactgt gccttctagt tgccagccat 7432 
ctgttgtttg cccctccccc gtgccttcct tgaccctgga aggtgccact cccactgtcc 7492 
tttcctaata aaatgaggaa attgcatcgc attgtctgag taggtgtcat tctattctgg 7552 
ggggtggggt ggggcaggac agcaaggggg aggattggga agacaatagc aggcatgctg 7612 
gggagctctt ccgcttcctc gctcactgac tcgctgcgct cggtcgttcg gctgcggcga 7672 
gcggtatcag ctcactcaaa ggcggtaata cggttatcca cagaatcagg ggataacgca 7732 
ggaaagaaca tgtgagcaaa aggccagcaa aaggccagga accgtaaaaa ggccgcgttg 7792 
ctggcgtttt tccataggct ccgcccccct gacgagcatc acaaaaatcg acgctcaagt 7852 
cagaggtggc gaaacccgac aggactataa agataccagg cgtttccccc tggaagctcc 7912 
ctcgtgcgct ctcctgttcc gaccctgccg cttaccggat acctgtccgc ctttctccct 7972 
tcgggaagcg tggcgctttc tcaatgctca cgctgtaggt atctcagttc ggtgtaggtc 8 032 
gttcgctcca agctgggctg tgtgcacgaa ccccccgttc agcccgaccg ctgcgcctta 8092 
tccggtaact atcgtcttga gtccaacccg gtaagacacg acttatcgcc actggcagca 8152 
gccactggta acaggattag cagagcgagg tatgtaggcg gtgctacaga gttcttgaag 8212 
tggtggccta actacggcta cactagaagg acagtatttg gtatctgcgc tctgctgaag 82 72 
ccagttacct tcggaaaaag agttggtagc tcttgatccg gcaaacaaac caccgctggt 8332 
agcggtggtt tttttgtttg caagcagcag attacgcgca gaaaaaaagg atctcaagaa 83 92 
gatcctttga tcttttctac ggggtctgac gctcagtgga acgaaaactc acgttaaggg 8452 
attttggtca tgagattatc aaaaaggatc ttcacctaga tccttttaaa ttaaaaatga 8512 
agttttaaat caatctaaag tatatatgag taaacttggt ctgacagtta ccaatgctta 8572 
atcagtgagg cacctatctc agcgatctgt ctatttcgtt catccatagt tgcctgactc 8632 
cccgtcgtgt agataactac gatacgggag ggcttaccat ctggccccag tgctgcaatg 8692 
ataccgcgag acccacgctc accggctcca gatttatcag caataaacca gccagccgga 8752 
agggccgagc gcagaagtgg tcctgcaact ttatccgcct ccatccagtc tattaattgt 8812 
tgccgggaag ctagagtaag tagttcgcca gttaatagtt tgcgcaacgt tgttgccatt 8872 
gctacaggca tcgtggtgtc acgctcgtcg tttggtatgg cttcattcag ctccggttcc 8932 
caacgatcaa ggcgagttac atgatccccc atgttgtgca aaaaagcggt tagctccttc 8992 
ggtcctccga tcgttgtcag aagtaagttg gccgcagtgt tatcactcat ggttatggca 9052 
gcactgcata attctcttac tgtcatgcca tccgtaagat gcttttctgt gactggtgag 9112 



27 



tactcaacca agtcattctg agaatagtgt atgcggcgac cgagttgctc ttgcccggcg 9172 
tcaatacggg ataataccgc gccacatagc agaactttaa aagtgctcat cattggaaaa 9232 
cgttcttcgg ggcgaaaact ctcaaggatc ttaccgctgt tgagatccag ttcgatgtaa 92 92 
cccactcgtg cacccaactg atcttcagca tcttttactt tcaccagcgt ttctgggtga 9352 
gcaaaaacag gaaggcaaaa tgccgcaaaa aagggaataa gggcgacacg gaaatgttga 9412 
atactcatac tcttcctttt tcaatattat tgaagcattt atcagggtta ttgtctcatg 9472 
agcggataca tatttgaatg tatttagaaa aataaacaaa taggggttcc gcgcacattt 9532 
ccccgaaaag tgccacctga cgtctaagaa accattatta tcatgacatt aacctataaa 9592 
aataggcgta tcacgaggcc ctttcgtc 952 0 



<210> 4 
<211> 1771 
<212> PRT 

<213> Artificial Sequence 
<400> 4 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
15 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly He Asp Pro Asn He Arg Thr Gly Val Arg Thr He Thr Thr Gly 
35 40 45 

Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp He He He Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp Gin Ala 
85 90 95 

Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn He Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu Glu Val 
130 135 140 

He Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 
145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly He Asn Ala Val Ala 
165 170 175 
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Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr He Glu Thr He Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 
245 250 255 

Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 
260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 

Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu He Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He Thr Leu Thr His Pro 
385 390 395 400 

Val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala He He Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He Glu Gin 
465 470 475 480 
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Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 525 

He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He Leu Gly Gly Trp Val 
565 570 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 

His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp He Trp Asp Trp He Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly He 



770 



775 



780 
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Met His Thr Arg Cys His Cys Gly Ala Glu He Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg He Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 

Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 

Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His He Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu He Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val He Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu He Ser Val Pro Ala Glu He Leu Arg Lys Ser Arg Arg 
02 5 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 



Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 
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Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly He 
105 1H0 1H5 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 H90 H95 1200 

Leu Pro He Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 

265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 

Thr His He Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro He Asp Thr Thr He Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu He Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 



32 



Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val lie Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn lie lie Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 

Ala Arg Met lie Leu Met Thr His Phe Phe Ser Val Leu lie Ala Arg 
1605 1610 1615 

Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser lie Glu Pro Leu Asp Leu Pro Pro lie lie Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu lie Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 1680 

Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 
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Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp lie Tyr His Ser Val 
1730 1735 1740 

Ser His Ala Arg Pro Arg Trp lie Trp Phe Cys Leu Leu Leu Leu Ala 

745 1750 1755 1760 

Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg 
1765 1770 

<210> 5 
<211> 4282 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: pCMVII 
<400> 5 





tcgcgcgttt 


cggtgatgac 


ggtgaaaacc 


tctgacacat 


gcagctcccg 


gagacggtca 


60 




cagcttgtct 


gtaagcggat 


gccgggagca 


gacaagcccg 


tcagggcgcg 


tcagcgggtg 


120 




ttggcgggtg 


tcggggctgg 


cttaactatg 


cggcatcaga 


gcagattgta 


ctgagagtgc 


180 




accatatgaa 


gctttttgca 


aaagcctagg 


cctccaaaaa 


agcctcctca 


ctacttctgg 


240 




aatagctcag 


aggccgaggc 


ggcctcggcc 


tctgcataaa 


taaaaaaaat 


tagtcagcca 


300 




tggggcggag 


aatgggcgga 


actgggcggg 


gagggaatta 


ttggctattg 


gccattgcat 


360 




acgttgtatc 


tatatcataa 


tatgtacatt 


tatattggct 


catgtccaat 


atgaccgcca 


420 




tgttgacatt 


gattattgac 


tagttattaa 


tagtaatcaa 


ttacggggtc 


attagttcat 


480 




agcccatata 


tggagttccg 


cgttacataa 


cttacggtaa 


atggcccgcc 


tggctgaccg 


540 




cccaacgacc 


cccgcccatt 


gacgtcaata 


atgacgtatg 


ttcccatagt 


aacgccaata 


600 




gggactttcc 


attgacgtca 


atgggtggag 


tatttacggt 


aaactgccca 


cttggcagta 


660 




catcaagtgt 


atcatatgcc 


aagtccgccc 


cctattgacg 


tcaatgacgg 


taaatggccc 


720 




gcctggcatt 


atgcccagta 


catgacctta 


cgggactttc 


ctacttggca 


gtacatctac 


780 




gtattagtca 


tcgctattac 


catggtgatg 


cggttttggc 


agtacaccaa 


tgggcgtgga 


840 




tagcggtttg 


actcacgggg 


atttccaagt 


ctccacccca 


ttgacgtcaa 


tgggagtttg 


900 




ttttggcacc 


aaaatcaacg 


ggactttcca 


aaatgtcgta 


ataaccccgc 


cccgttgacg 


960 
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caaatgggcg gtaggcgtgt acggtgggag 
cgtcagatcg cctggagacg ccatccacgc 
cgatccagcc tccgcggccg ggaacggtgc 
gacgtaagta ccgcctatag actctatagg 
ctgtttttgg cttggggcct atacaccccc 
ttagcctata ggtgtgggtt attgaccatt 
ttccattact aatccataac atggctcttt 
atactctgtc cttcagagac tgacacggac 
attatttaca aattcacata tacaacaacg 
catagcgtgg gatctccacg cgaatctcgg 
gtagcggcgg agcttccaca tccgagccct 
tcggcagctc cttgctccta acagtggagg 
ccaccagtgt gccgcacaag gccgtggcgg 
attgggctcg caccgctgac gcagatggaa 
gcagctgagt tgttgtattc tgataagagt 
cggtggaggg cagtgtagtc tgagcagtac 
atagctgaca gactaacaga ctgttccttt 
gacctaagaa ttcagactcg agcaagtcta 
tacgcgttag agctcgctga tcagcctcga 
tttgcccctc ccccgtgcct tccttgaccc 
aataaaatga ggaaattgca tcgcattgtc 
gggtggggca ggacagcaag ggggaggatt 
tcttccgctt cctcgctcac tgactcgctg 
tcagctcact caaaggcggt aatacggtta 
aacatgtgag caaaaggcca gcaaaaggcc 
tttttccata ggctccgccc ccctgacgag 
tggcgaaacc cgacaggact ataaagatac 
cgctctcctg ttccgaccct gccgcttacc 
agcgtggcgc tttctcaatg ctcacgctgt 




gtctatataa gcagagctcg tttagtgaac 102 0 
tgttttgacc tccatagaag acaccgggac 1080 
attggaacgc ggattccccg tgccaagagt 114 0 
cacacccctt tggctcttat gcatgctata 1200 
gcttccttat gctataggtg atggtatagc 12 60 
attgaccact cccctattgg tgacgatact 132 0 
gccacaacta tctctattgg ctatatgcca 1380 
tctgtatttt tacaggatgg ggtcccattt 144 0 
ccgtcccccg tgcccgcagt ttttattaaa 150 0 
gtacgtgttc cggacatggg ctcttctccg 1560 
ggtcccatgc ctccagcggc tcatggtcgc 162 0 
ccagacttag gcacagcaca atgcccacca 1680 
tagggtatgt gtctgaaaat gagctcggag 174 0 
gacttaaggc agcggcagaa gaagatgcag 1800 
cagaggtaac tcccgttgcg gtgctgttaa 1860 
tcgttgctgc cgcgcgcgcc accagacata 192 0 
ccatgggtct tttctgcagt caccgtcgtc 1980 
gaaaggcgcg ccaagatatc aaggatccac 2 04 0 
ctgtgccttc tagttgccag ccatctgttg 2100 
tggaaggtgc cactcccact gtcctttcct 2160 
tgagtaggtg tcattctatt ctggggggtg 2220 
gggaagacaa tagcaggcat gctggggagc 22 80 
cgctcggtcg ttcggctgcg gcgagcggta 2 34 0 
tccacagaat caggggataa cgcaggaaag 2400 
aggaaccgta aaaaggccgc gttgctggcg 24 60 
catcacaaaa atcgacgctc aagtcagagg 252 0 
caggcgtttc cccctggaag ctccctcgtg 2580 
ggatacctgt ccgcctttct cccttcggga 2 640 
aggtatctca gttcggtgta ggtcgttcgc 2700 
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tccaagctgg gctgtgtgca cgaacccccc 
aactatcgtc ttgagtccaa cccggtaaga 
ggtaacagga ttagcagagc gaggtatgta 
cctaactacg gctacactag aaggacagta 
accttcggaa aaagagttgg tagctcttga 
ggtttttttg tttgcaagca gcagattacg 
ttgatctttt ctacggggtc tgacgctcag 
gtcatgagat tatcaaaaag gatcttcacc 
aaatcaatct aaagtatata tgagtaaact 
gaggcaccta tctcagcgat ctgtctattt 
gtgtagataa ctacgatacg ggagggctta 
cgagacccac gctcaccggc tccagattta 
gagcgcagaa gtggtcctgc aactttatcc 
gaagctagag taagtagttc gccagttaat 
ggcatcgtgg tgtcacgctc gtcgtttggt 
tcaaggcgag ttacatgatc ccccatgttg 
ccgatcgttg tcagaagtaa gttggccgca 
cataattctc ttactgtcat gccatccgta 
accaagtcat tctgagaata gtgtatgcgg 
cgggataata ccgcgccaca tagcagaact 
tcggggcgaa aactctcaag gatcttaccg 
cgtgcaccca actgatcttc agcatctttt 
acaggaaggc aaaatgccgc aaaaaaggga 
atactcttcc tttttcaata ttattgaagc 
tacatatttg aatgtattta gaaaaataaa 
aaagtgccac ctgacgtcta agaaaccatt 
cgtatcacga ggccctttcg tc 

<210> 6 
<211> 6299 



gttcagcccg accgctgcgc cttatccggt 2760 
cacgacttat cgccactggc agcagccact 2 82 0 
ggcggtgcta cagagttctt gaagtggtgg 2 880 
tttggtatct gcgctctgct gaagccagtt 2940 
tccggcaaac aaaccaccgc tggtagcggt 3 000 
cgcagaaaaa aaggatctca agaagatcct 3060 
tggaacgaaa actcacgtta agggattttg 312 0 
tagatccttt taaattaaaa atgaagtttt 3180 
tggtctgaca gttaccaatg cttaatcagt 3240 
cgttcatcca tagttgcctg actccccgtc 3300 
ccatctggcc ccagtgctgc aatgataccg 3360 
tcagcaataa accagccagc cggaagggcc 342 0 
gcctccatcc agtctattaa ttgttgccgg 3480 
agtttgcgca acgttgttgc cattgctaca 354 0 
atggcttcat tcagctccgg ttcccaacga 3600 
tgcaaaaaag cggttagctc cttcggtcct 3660 
gtgttatcac tcatggttat ggcagcactg 3 72 0 
agatgctttt ctgtgactgg tgagtactca 3780 
cgaccgagtt gctcttgccc ggcgtcaata 3 840 
ttaaaagtgc tcatcattgg aaaacgttct 3 900 
ctgttgagat ccagttcgat gtaacccact 3 960 
actttcacca gcgtttctgg gtgagcaaaa 4020 
ataagggcga cacggaaatg ttgaatactc 4 0 80 
atttatcagg gttattgtct catgagcgga 4140 
caaatagggg ttccgcgcac atttccccga 42 00 
attatcatga cattaaccta taaaaatagg 4260 

4282 
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<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: pNS34a 

<220> 
<221> CDS 

<222> (1990) . . (4047) 
<400> 6 



cgcgcgtttc 


ggtgatgacg 


gtgaaaacct 


ctgacacatg 


cagctcccgg 


agacggtcac 


60 


agcttgtctg 


taagcggatg 


ccgggagcag 


acaagcccgt 


cagggcgcgt 


cagcgggtgt 


120 


tggcgggtgt 


cggggctggc 


ttaactatgc 


ggcatcagag 


cagattgtac 


tgagagtgca 


180 


ccatatgaag 


ctttttgcaa 


aagcctaggc 


ctccaaaaaa 


gcctcctcac 


tacttctgga 


240 


atagctcaga 


ggccgaggcg 


gcctcggcct 


ctgcataaat 


aaaaaaaatt 


agtcagccat 


300 


ggggcggaga 


atgggcggaa 


ctgggcgggg 


agggaattat 


tggctattgg 


ccattgcata 


360 


cgttgtatct 


atatcataat 


atgtacattt 


atattggctc 


atgtccaata 


tgaccgccat 


420 


gttgacattg 


attattgact 


agttattaat 


agtaatcaat 


tacggggtca 


ttagttcata 


480 


gcccatatat 


ggagttccgc 


gttacataac 


ttacggtaaa 


tggcccgcct 


ggctgaccgc 


540 


ccaacgaccc 


ccgcccattg 


acgtcaataa 


tgacgtatgt 


tcccatagta 


acgccaatag 


600 


ggactttcca 


ttgacgtcaa 


tgggtggagt 


atttacggta 


aactgcccac 


ttggcagtac 


660 


atcaagtgta 


tcatatgcca 


agtccgcccc 


ctattgacgt 


caatgacggt 


aaatggcccg 


720 


cctggcatta 


tgcccagtac 


atgaccttac 


gggactttcc 


tacttggcag 


tacatctacg 


780 


tattagtcat 


cgctattacc 


atggtgatgc 


ggttttggca 


gtacaccaat 


gggcgtggat 


840 


agcggtttga 


ctcacgggga 


tttccaagtc 


tccaccccat 


tgacgtcaat 


gggagtttgt 


900 


tttggcacca 


aaatcaacgg 


gactttccaa 


aatgtcgtaa 


taaccccgcc 


ccgttgacgc 


960 


aaatgggcgg 


taggcgtgta 


cggtgggagg 


tctatataag 


cagagctcgt 


ttagtgaacc 


1020 


gtcagatcgc 


ctggagacgc 


catccacgct 


gttttgacct 


ccatagaaga 


caccgggacc 


1080 


gatccagcct 


ccgcggccgg 


gaacggtgca 


ttggaacgcg 


gattccccgt 


gccaagagtg 


1140 


acgtaagtac 


cgcctataga 


ctctataggc 


acaccccttt 


ggctcttatg 


catgctatac 


1200 


tgtttttggc 


ttggggccta 


tacacccccg 


ctccttatgc 


tataggtgat 


ggtatagctt 


1260 


agcctatagg 


tgtgggttat 


tgaccattat 


tgaccactcc 


cctattggtg 


acgatacttt 


1320 


ccattactaa 


tccataacat 


ggctctttgc 


cacaactatc 


tctattggct 


atatgccaat 


1380 



actctgtcct tcagagactg acacggactc tgtattttta caggatgggg tccatttatt 1440 

atttacaaat tcacatatac aacaacgccg tcccccgtgc ccgcagtttt tattaaacat 1500 

agcgtgggat ctccgacatc tcgggtacgt gttccggaca tgggctcttc tccggtagcg 156 0 

gcggagcttc cacatccgag ccctggtccc atccgtccag cggctcatgg tcgctcggca 162 0 

gctccttgct cctaacagtg gaggccagac ttaggcacag cacaatgccc accaccacca 1680 

gtgtgccgca caaggccgtg gcggtagggt atgtgtctga aaatgagctc ggagattggg 1740 

ctcgcacctg gacgcagatg gaagacttaa ggcagcggca gaagaagatg caggcagctg 1800 

agttgttgta ttctgataag agtcagaggt aactcccgtt gcggtgctgt taacggtgga 1860 

gggcagtgta gtctgagcag tactcgttgc tgccgcgcgc gccaccagac ataatagctg 192 0 

acagactaac agactgttcc tttccatggg tcttttctgc agtcaccgtc gtcgacctaa 1980 

gaattcacc atg gcg ccc ate acg gcg tac gec cag cag aca agg ggc etc 2031 
Met Ala Pro lie Thr Ala Tyr Ala Gin Gin Thr Arg Gly Leu 
15 10 

eta ggg tgc ata ate ace age eta act ggc egg gac aaa aac caa gtg 2 07 9 
Leu Gly Cys lie lie Thr Ser Leu Thr Gly Arg Asp Lys Asn Gin Val 
15 20 25 30 

gag ggt gag gtc cag att gtg tea act get gec caa acc ttc ctg gca 212 7 
Glu Gly Glu Val Gin lie Val Ser Thr Ala Ala Gin Thr Phe Leu Ala 
35 40 45 

acg tgc ate aat ggg gtg tgc tgg act gtc tac cac ggg gec gga acg 2175 
Thr Cys lie Asn Gly Val Cys Trp Thr Val Tyr His Gly Ala Gly Thr 
50 55 60 

agg acc ate gcg tea ccc aag ggt cct gtc ate cag atg tat acc aat 2 223 
Arg Thr lie Ala Ser Pro Lys Gly Pro Val lie Gin Met Tyr Thr Asn 
65 70 75 

gta gac caa gac ctt gtg ggc tgg ccc get teg caa ggt acc cgc tea 22 71 
Val Asp Gin Asp Leu Val Gly Trp Pro Ala Ser Gin Gly Thr Arg Ser 
80 85 90 

ttg aca ccc tgc act tgc ggc tec teg gac ctt tac ctg gtc acg agg 2 319 
Leu Thr Pro Cys Thr Cys Gly Ser Ser Asp Leu Tyr Leu Val Thr Arg 
95 100 105 110 

cac gec gat gtc att ccc gtg cgc egg egg ggt gat age agg ggc age 23 67 
His Ala Asp Val lie Pro Val Arg Arg Arg Gly Asp Ser Arg Gly Ser 
115 120 125 

ctg ctg teg ccc egg ccc att tec tac ttg aaa ggc tec teg ggg ggt 2415 
Leu Leu Ser Pro Arg Pro lie Ser Tyr Leu Lys Gly Ser Ser Gly Gly 
130 135 140 

ccg ctg ttg tgc ccc gcg ggg cac gec gtg ggc ata ttt agg gec gcg 2463 
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Pro Leu Leu Cys Pro Ala Gly His Ala Val Gly lie Phe Arg Ala Ala 
145 150 155 

gtg tgc acc cgt gga gtg get aag gcg gtg gac ttt ate cct gtg gag 2511 
Val Cys Thr Arg Gly Val Ala Lys Ala Val Asp Phe lie Pro Val Glu 
160 165 170 

aac eta gag aca acc atg agg tec ccg gtg ttc acg gat aac tec tct 2559 
Asn Leu Glu Thr Thr Met Arg Ser Pro Val Phe Thr Asp Asn Ser Ser 
175 180 185 190 

cca cca gta gtg ccc cag age ttc cag gtg get cac etc cat get ccc 2607 
Pro Pro Val Val Pro Gin Ser Phe Gin Val Ala His Leu His Ala Pro 
195 200 205 

aca ggc age ggc aaa age acc aag gtc ccg get gca tat gca get cag 2655 
Thr Gly Ser Gly Lys Ser Thr Lys Val Pro Ala Ala Tyr Ala Ala Gin 
210 215 220 

ggc tat aag gtg eta gta etc aac ccc tct gtt get gca aca ctg ggc 2703 
Gly Tyr Lys Val Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly 
225 230 235 

ttt ggt get tac atg tec aag get cat ggg ate gat cct aac ate agg 2751 
Phe Gly Ala Tyr Met Ser Lys Ala His Gly lie Asp Pro Asn lie Arg 
240 245 250 

acc ggg gtg aga aca att acc act ggc age ccc ate acg tac tec acc 2799 
Thr Gly Val Arg Thr lie Thr Thr Gly Ser Pro lie Thr Tyr Ser Thr 
255 260 265 270 

tac ggc aag ttc ctt gee gac ggc ggg tgc teg ggg ggc get tat gac 2 847 
Tyr Gly Lys Phe Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp 
275 280 285 

ata ata att tgt gac gag tgc cac tec acg gat gee aca tec ate ttg 2 895 
lie lie lie Cys Asp Glu Cys His Ser Thr Asp Ala Thr Ser lie Leu 
290 295 300 

ggc att ggc act gtc ctt gac caa gca gag act gcg ggg gcg aga ctg 2 943 
Gly lie Gly Thr Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu 
305 310 315 

gtt gtg etc gee acc gee acc cct ccg ggc tec gtc act gtg ccc cat 2991 
Val Val Leu Ala Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His 
320 325 330 

ccc aac ate gag gag gtt get ctg tec acc acc gga gag ate cct ttt 3039 
Pro Asn He Glu Glu Val Ala Leu Ser Thr Thr Gly Glu He Pro Phe 
335 340 345 350 

tac ggc aag get ate ccc etc gaa gta ate aag ggg ggg aga cat etc 3 087 
Tyr Gly Lys Ala He Pro Leu Glu Val He Lys Gly Gly Arg His Leu 
355 360 365 

ate ttc tgt cat tea aag aag aag tgc gac gaa etc gec gca aag ctg 3135 
He Phe Cys His Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu 
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370 375 380 

gtc gca ttg ggc ate aat gec gtg gec tac tac cgc ggt ctt gac gtg 3183 
Val Ala Leu Gly lie Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val 
385 390 395 

tec gtc ate ccg acc age ggc gat gtt gtc gtc gtg gca ace gat gee 3231 
Ser Val lie Pro Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala 
400 405 410 

etc atg acc ggc tat acc ggc gac ttc gac teg gtg ata gac tgc aat 3279 
Leu Met Thr Gly Tyr Thr Gly Asp Phe Asp Ser Val lie Asp Cys Asn 
415 420 425 430 

acg tgt gtc acc cag aca gtc gat ttc age ctt gac cct acc ttc acc 3327 
Thr Cys Val Thr Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr 
435 440 445 

att gag aca ate acg etc ccc caa gat get gtc tec cgc act caa cgt 3375 
lie Glu Thr lie Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg 
450 455 460 

c 99 gg c a 99 act 99 c a 99 999 aa 9 cca 99° atc tac a 9 a ttt: 9 fc 9 9 ca 3423 

Arg Gly Arg Thr Gly Arg Gly Lys Pro Gly lie Tyr Arg Phe Val Ala 
465 470 475 

cc 9 999 9 a 9 C 9 C ccc tcc 99 c at 9 ttc 9 ac tc 9 tcc 9 tc ctc tgt gag 3471 
Pro Gly Glu Arg Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu 
480 485 490 

tgc tat gac gca ggc tgt get tgg tat gag ctc acg ccc gec gag act 3519 
Cys Tyr Asp Ala Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr 
495 500 505 510 

aca gtt agg eta cga gcg tac atg aac acc ccg ggg ctt ccc gtg tgc 3 567 
Thr Val Arg Leu Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys 
515 520 525 

cag gac cat ctt gaa ttt tgg gag ggc gtc ttt aca ggc ctc act cat 3615 
Gin Asp His Leu Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His 
530 535 540 

ata gat gec cac ttt eta tcc cag aca aag cag agt ggg gag aac ctt 3 663 
lie Asp Ala His Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu 
545 550 555 

cct tac ctg gta gcg tac caa gee acc gtg tgc get agg get caa gec 3711 
Pro Tyr Leu Val Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala 
560 565 570 

cct ccc cca teg tgg gac cag atg tgg aag tgt ttg att cgc ctc aag 3759 
Pro Pro Pro Ser Trp Asp Gin Met Trp Lys Cys Leu lie Arg Leu Lys 
575 580 585 590 

ccc acc ctc cat ggg cca aca ccc ctg eta tac aga ctg ggc get gtt 3 807 
Pro Thr Leu His Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val 
595 600 605 
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cag aat gaa ate acc ctg acg cac cca gtc acc aaa tac ate atg aca 3855 
Gin Asn Glu lie Thr Leu Thr His Pro Val Thr Lys Tyr lie Met Thr 
610 615 620 

tgc atg teg gee gac ctg gag gtc gtc acg age acc tgg gtg etc gtt 3903 
Cys Met Ser Ala Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val 
625 630 635 

99 c 99 c 9 tc ct g get get ttg gee gcg tat tgc ctg tea aca ggc tgc 3951 
Gly Gly Val Leu Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys 
640 645 650 

gtg gtc ata gtg ggc agg gtc gtc ttg tec ggg aag ccg gca ate ata 3999 
Val Val He Val Gly Arg Val Val Leu Ser Gly Lys Pro Ala He He 
655 660 665 670 



cct gac agg gaa gtc etc tac cga gag ttc gat gag atg gaa gag tgc 4047 
Pro Asp Arg Glu Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys 





675 




680 




685 




taggatccac 


tacgegttag 


agetegctga 


tcagcctcga 


ctgtgccttc 


tagttgccag 


4107 


ccatctgttg 


tttgcccctc 


ccccgtgcct 


tccttgaccc 


tggaaggtgc 


cactcccact 


4167 


gtcctttcct 


aataaaatga 


ggaaattgea 


tcgcattgtc 


tgagtaggtg 


tcattctatt 


4227 


ctggggggtg 


gggtggggca 


ggacagcaag 


ggggaggatt 


gggaagacaa 


tagcaggcat 


4287 


gctggggagc 


tcttccgctt 


cctcgctcac 


tgactcgctg 


cgctcggtcg 


ttcggctgcg 


4347 


gcgagcggta 


tcagctcact 


caaaggeggt 


aatacggtta 


tccacagaat 


caggggataa 


4407 


cqcacrqaaaQ 

w — j -w Wfc ^-J Wb Wl ^ 


aacatgtgag 


caaaaggeca 


geaaaaggee 


aggaacegta 


aaaaggcege 


4467 


gttgctggcg 


tttttccata 


ggctccgccc 


ccctgacgag 


catcacaaaa 


atcgacgctc 


4527 


aagtcagagg 


tggegaaace 


cgacaggact 


ataaagatac 


caggegttte 


cccctggaag 


4587 


ctccctcgtg 


cgctctcctg 


ttccgaccct 


gccgcttacc 


ggatacctgt 


ccgcctttct 


4647 


cccttcggga 


agcgtggcgc 


tttctcaatg 


ctcacgctgt 


aggtatctca 


gttcggtgta 


4707 


ggtcgttcgc 


tccaagctgg 


gctgtgtgca 


cgaacccccc 


gttcagcccg 


accgctgcgc 


4767 


ettatceggt 


aactategtc 


ttgagtccaa 


cceggtaaga 


cacgacttat 


cgccactggc 


4827 


agcagccact 


ggtaacagga 


ttagcagagc 


gaggtatgta 


ggcggtgcta 


cagagttctt 


4887 


gaagtggtgg 


cctaactacg 


gctacactag 


aaggacagta 


tttggtatct 


gcgctctgct 


4947 


gaagccagtt 


acctteggaa 


aaagagttgg 


tagctcttga 


tccggcaaac 


aaaccaccgc 


5007 


tggtagcggt 


ggtttttttg 


tttgeaagea 


gcagattacg 


cgcagaaaaa 


aaggatctca 


5067 


agaagatcct 


ttgatctttt 


etaeggggtc 


tgacgctcag 


tggaacgaaa 


actcaegtta 


5127 


agggattttg 


gtcatgagat 


tatcaaaaag 


gatcttcacc 


tagatccttt 


taaattaaaa 


5187 
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atgaagtttt 


aaatcaatct 


aaagtatata 


tgagtaaact 


tggtctgaca 


gttaccaatg 


5247 


cttaatcagt 


gaggcaccta 


tctcagcgat 


ctgtctattt 


cgttcatcca 


tagttgcctg 


5307 


actccccgtc 


gtgtagataa 


ctacgatacg 


ggagggctta 


ccatctggcc 


ccagtgctgc 


5367 


aatgataccg 


cgagacccac 


gctcaccggc tccagattta 


tcagcaataa 


accagccagc 


5427 


cggaagggcc 


gagcgcagaa 


gtggtcctgc 


aactttatcc 


gcctccatcc 


agtctattaa 


5487 


ttgttgccgg 


gaagctagag 


taagtagttc 


gccagttaat 


agtttgcgca 


acgttgttgc 


5547 


cattgctaca 


ggcatcgtgg 


tgtcacgctc 


gtcgtttggt 


atggcttcat 


tcagctccgg 


5607 


ttcccaacga 


tcaaggcgag 


ttacatgatc 


ccccatgttg 


tgcaaaaaag 


cggttagctc 


5667 


cttcggtcct 


ccgatcgttg 


tcagaagtaa 


gttggccgca 


gtgttatcac 


tcatggttat 


5727 


ggcagcactg 


cataattctc 


ttactgtcat 


gccatccgta 


agatgctttt 


ctgtgactgg 


5787 


tgagtactca 


accaagtcat 


tctgagaata 


gtgtatgcgg 


cgaccgagtt 


gctcttgccc 


5847 


ggcgtcaata 


cgggataata 


ccgcgccaca 


tagcagaact 


ttaaaagtgc 


tcatcattgg 


5907 


aaaacgttct 


tcggggcgaa 


aactctcaag 


gatcttaccg 


ctgttgagat 


ccagttcgat 


5967 


gtaacccact 


cgtgcaccca 


actgatcttc 


agcatctttt 


actttcacca 


gcgtttctgg 


6027 


gtgagcaaaa 


acaggaaggc 


aaaatgccgc 


aaaaaaggga 


ataagggcga 


cacggaaatg 


6087 


ttgaatactc 


atactcttcc 


tttttcaata 


ttattgaagc 


atttatcagg 


gttattgtct 


6147 


catgagcgga 


tacatatttg 


aatgtattta 


gaaaaataaa 


caaatagggg 


ttccgcgcac 


6207 


atttccccga 


aaagtgccac 


ctgacgtcta 


agaaaccatt 


attatcatga 


cattaaccta 


6267 


taaaaatagg 


cgtatcacga 


ggccctttcg 


tc 






6299 



<210> 7 
<211> 686 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: pNS34a 
<400> 7 

Met Ala Pro lie Thr Ala Tyr Ala Gin Gin Thr Arg Gly Leu Leu Gly 
15 10 15 

Cys lie lie Thr Ser Leu Thr Gly Arg Asp Lys Asn Gin Val Glu Gly 
20 25 30 

Glu Val Gin lie Val Ser Thr Ala Ala Gin Thr Phe Leu Ala Thr Cys 
35 40 45 
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lie Asn Gly Val Cys Trp Thr Val Tyr His Gly Ala Gly Thr Arg Thr 
50 55 60 

lie Ala Ser Pro Lys Gly Pro Val lie Gin Met Tyr Thr Asn Val Asp 
65 70 75 80 

Gin Asp Leu Val Gly Trp Pro Ala Ser Gin Gly Thr Arg Ser Leu Thr 
85 90 95 

Pro Cys Thr Cys Gly Ser Ser Asp Leu Tyr Leu Val Thr Arg His Ala 
100 105 110 

Asp Val lie Pro Val Arg Arg Arg Gly Asp Ser Arg Gly Ser Leu Leu 
115 120 125 

Ser Pro Arg Pro lie Ser Tyr Leu Lys Gly Ser Ser Gly Gly Pro Leu 
130 135 140 

Leu Cys Pro Ala Gly His Ala Val Gly lie Phe Arg Ala Ala Val Cys 
145 150 155 160 

Thr Arg Gly Val Ala Lys Ala Val Asp Phe lie Pro Val Glu Asn Leu 
165 170 175 

Glu Thr Thr Met Arg Ser Pro Val Phe Thr Asp Asn Ser Ser Pro Pro 
180 185 190 

Val Val Pro Gin Ser Phe Gin Val Ala His Leu His Ala Pro Thr Gly 
195 200 205 

Ser Gly Lys Ser Thr Lys Val Pro Ala Ala Tyr Ala Ala Gin Gly Tyr 
210 215 220 

Lys Val Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly 
225 230 235 240 

Ala Tyr Met Ser Lys Ala His Gly lie Asp Pro Asn lie Arg Thr Gly 
245 250 255 

Val Arg Thr lie Thr Thr Gly Ser Pro lie Thr Tyr Ser Thr Tyr Gly 
260 265 270 

Lys Phe Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp lie lie 
275 280 285 

lie Cys Asp Glu Cys His Ser Thr Asp Ala Thr Ser lie Leu Gly lie 
290 295 300 

Gly Thr Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val 
305 310 315 320 

Leu Ala Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn 
325 330 335 

lie Glu Glu Val Ala Leu Ser Thr Thr Gly Glu lie Pro Phe Tyr Gly 
340 345 350 
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Lys Ala lie Pro Leu Glu Val lie Lys Gly Gly Arg His Leu lie Phe 
355 360 365 

Cys His Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala 
370 375 380 

Leu Gly lie Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val 
385 390 395 400 

lie Pro Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met 
405 410 415 

Thr Gly Tyr Thr Gly Asp Phe Asp Ser Val lie Asp Cys Asn Thr Cys 
420 425 430 

Val Thr Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr lie Glu 
435 440 445 

Thr lie Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly 
450 455 460 

Arg Thr Gly Arg Gly Lys Pro Gly lie Tyr Arg Phe Val Ala Pro Gly 
465 470 475 480 

Glu Arg Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr 
485 490 495 

Asp Ala Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val 
500 505 510 

Arg Leu Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp 
515 520 525 

His Leu Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His lie Asp 
530 535 540 

Ala His Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr 
545 550 555 560 

Leu Val Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro 
565 570 575 

Pro Ser Trp Asp Gin Met Trp Lys Cys Leu lie Arg Leu Lys Pro Thr 
580 585 590 

Leu His Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn 
595 600 605 

Glu lie Thr Leu Thr His Pro Val Thr Lys Tyr lie Met Thr Cys Met 
610 615 620 

Ser Ala Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val Gly Gly 
625 630 635 640 

Val Leu Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val 
645 650 655 
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lie Val Gly Arg Val Val Leu Ser Gly Lys Pro Ala lie lie Pro Asp 
660 665 670 

Arg Glu Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys 
675 680 685 



<210> 8 
<211> 19912 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: pd. deltaNS3NS5 

<220> 
<221> CDS 

<222> (12745) . . (18057) 
<400> 8 



atcgatccta 


ccccttgcgc 


taaagaagta 


tatgtgccta 


ctaacgcttg 


tctttgtctc 


60 


tgtcactaaa 


cactggatta 


ttactcccag 


atacttattt 


tggactaatt 


taaatgattt 


120 


cggatcaacg 


ttcttaatat 


cgctgaatct 


tccacaattg 


atgaaagtag 


ctaggaagag 


180 


gaattggtat 


aaagtttttg 


tttttgtaaa 


tctcgaagta 


tactcaaacg 


aatttagtat 


240 


tttctcagtg 


atctcccaga 


tgctttcacc 


ctcacttaga 


agtgctttaa 


gcattttttt 


300 


actataacta 


tttcccttat 


ctgcttcttc 


cgatgattcg 


aactgtaatt 


gcaaactact 


360 


tacaatatca 


gtgatatcag 


attgatgttt 


ttgtccatag 


taaggaataa 


ttgtaaattc 


420 


ccaagcagga 


atcaatttct 


ttaatgaggc 


ttccagaatt 


gttgcttttt 


gcgtcttgta 


480 


tttaaactgg 


agtgatttat 


tgacaatatc 


gaaactcagc 


gaattgctta 


tgatagtatt 


540 


atagctcatg 


aatgtggctc 


tcttgattgc 


tgttccgtta 


tgtgtaatca 


tccaacataa 


600 


ataggttagt 


tcagcagcac 


ataatgctat 


tttctcacct 


gaaggtcttt 


caaacctttc 


660 


cacaaactga 


cgaacaagca 


ccttaggtgg 


tgttttacat 


aatatatcaa 


attgtggcat 


720 


gcttagcgcc 


gatcttgtgt 


gcaattgata 


tctagtttca 


actactctat 


ttatcttgta 


780 


tcttgcagta 


ttcaaacacg 


ctaactcgaa 


aaactaactt 


taattgtcct 


gtttgtctcg 


840 


cgttctttcg 


aaaaatgcac 


cggccgcgca 


ttatttgtac 


tgcgaaaata 


attggtactg 


900 


cggtatcttc 


atttcatatt 


ttaaaaatgc 


acctttgctg 


cttttcctta 


atttttagac 


960 


ggcccgcagg 


ttcgttttgc 


ggtactatct 


tgtgataaaa 


agttgttttg 


acatgtgatc 


102i 


tgcacagatt 


ttataatgta 


ataagcaaga 


atacattatc 


aaacgaacaa 


tactggtaaa 


108' 


agaaaaccaa 


aatggacgac 


attgaaacag 


ccaagaatct 


gacggtaaaa 


gcacgtacag 


114' 
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cttatagcgt ctgggatgta tgtcggctgt 
ttgatataga gagtaaacgt aagtctgatg 
ccatggaatc tctcacaacc ggtaggccgt 
gcgtatcttc tgactccagt gctgaggtaa 
ggtttgattc gattggaaat ggtatgctct 
atttgatgct acagaataac aagctgttag 
ctataataat aggaagattg cccgagaaag 
gaaaaatgga ttgtacacag ttattagtcc 
agctcgtaag cgtcgttacc caattgctta 
taataggtga tttattcatc ccggaatctc 
tggcggcaga gaatcgttta cagcaaaaaa 
accatgctaa tacaaatgaa gaagttccct 
caagaggagc atataaatta caaaacacca 
aaaaaaggag agtagcaacg agggtaaggg 
gatccaatat caaaggaaat gatagcattg 
cagcatatag aacagctaaa gggtagtgct 
gggataatat cacaggaggt actagactac 
gtacgcattt aagcataaac acgcactatg 
caacacgcag atataggtgc gacgtgaaca 
ttttcggaag cgctcgtttt cggaaacgct 
ctagaaagta taggaacttc agagcgcttt 
ttcaaaaaac caaaaacgca ccggactgta 
tccacaaaca ttgctcaaaa gtatctcttt 
aacctaccca tccacctttc gctccttgaa 
aggcttccaa tgctcttcaa attttactgt 
ctcttcataa tgtaagctta tctttatcga 
ctttacggtt ccctgagatt gaattagttc 
ctttgtacga cgaattttga ggttcgccat 
gaaacatgct gcttaaaact ccaagcggta 



ttattgaaat gattgctcct gatgtagata 12 0 0 
agctactctt tccaggatat gtcataaggc 12 60 
atggtcttga ttctagcgca gaagattcca 132 0 
ttttgcctgc tgcgaagatg gttaaggaaa 13 80 
cttcacaaga agcaagtcag gctgccatag 144 0 
acaatagaaa gcaactatac aaatctattg 1500 
acaagaagag agctaccgaa atgctcatga 1560 
caccagctcc aacggaagaa gatgttatga 162 0 
ctttagttcc accagatcgt caagctgctt 1680 
taaaggatat attcaatagt ttcaatgaac 174 0 
agagtgagtt ggaaggaagg actgaagtga 1800 
ccaggcgaac aagaagtaga gacacaaatg 1860 
tcactgaggg ccctaaagcg gttcccacga 192 0 
gcagaaaatc acgtaatact tctagggtat 1980 
aaggatgaga ctaatccaat tgaggagtgg 2040 
gaaggaagca tacgataccc cgcatggaat 2100 
ctttcatcct acataaatag acgcatataa 2160 
ccgttcttct catgtatata tatatacagg 2220 
gtgagctgta tgtgcgcagc tcgcgttgca 22 80 
ttgaagttcc tattccgaag ttcctattct 2340 
tgaaaaccaa aagcgctctg aagacgcact 2400 
acgagctact aaaatattgc gaataccgct 2460 
gctatatatc tctgtgctat atccctatat 2520 
cttgcatcta aactcgacct ctacatcaac 2580 
caagtagacc catacggctg taatatgctg 2 640 
atcgtgtgaa aaactactac cgcgataaac 2 70 0 
ctttagtata tgatacaaga cacttttgaa 2 760 
cctctggcta tttccaatta tcctgtcggc 2820 
ggagaccgat aaaggttaat aggacagccg 2 880 
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tattatctcc gcctcagttt gatcttccgc 
tatttcaccc cacaatcctt catccgcctc 
atgttgtaca ttgtttagtt cacgagaagg 
tatatgacct ttatcctgtt ctctttccac 
gcacctaata acattcttca aggcggagaa 
tgaaaacgtg agaatgaatt tagtattatt 
tcgaagataa gagaagaatg cagtgacctt 
aaaaaatacg cctttaggcc ttctgatacc 
attaatatct aaaccctctc cgatggtggc 
aaactgtgat aattctgggt gatttatgat 
aggatcaggc caatccagtt ctttttcaat 
tccaacaaat gcaaatgcta acgttttgta 
cccccttgtc gtctcgatta cacacctact 
cataatacat tgcttaatac aagcaagcag 
cattacagct gatgtcattg tatatcagcg 
tcgcggtttt tataaacaaa actttcgtta 
ttggaaattc gggaaaaagt agagcaacgc 
ttaacttcga gaagggatta aggctaattt 
ccattgaatg ccttataaaa cagctataga 
tttgtcaaag cttactgatg atgatgtgtc 
tgacattata aagctggcac ttagaattcc 
tctactgtac gatacacttc cgctcaggtc 
ttgttactct attgatccag ctcagcaaag 
tgtagtaaaa ctagctagac cgagaaagag 
gctgccatca ttattatccg atgtgacgct 
tttttttttt tttttttttt ttttttggta 
agcaaggatt ttcttaactt cttcggcgac 
accacctaaa tcaccagttc tgatacctgc 
ggctttacct tcttcaggca agttcaatga 




ttcagactgc catttttcac ataatgaatc 2940 
cgcatcttgt tccgttaaac tattgacttc 3000 
gtcctcttca ggcggtagct cctgatctcc 3060 
aaacttagaa atgtattcat gaattatgga 312 0 
gtttgggcca gatgcccaat atgcttgaca 3180 
gtgatattct gaggcaattt tattataatc 324 0 
tgtattgaca aatggagatt ccatgtatct 3300 
ctttcccctg cggtttagcg tgccttttac 3360 
ctttaactga ctaataaatg caaccgatat 342 0 
tcgatcgaca attgtattgt acactagtgc 3480 
taccggtgtg tcgtctgtat tcagtacatg 354 0 
tttcttataa ttgtcaggaa ctggaaaagt 3 60 0 
ttcatcgtac accataggtt ggaagtgctg 3660 
tctctcgcca ttcatatttc agttattttc 3720 
ctgtaaaaat ctatctgtta cagaaggttt 3780 
cgaaatcgag caatcacccc agctgcgtat 3 84 0 
gagttgcatt ttttacacca taatgcatga 3 900 
cactagtatg tttcaaaaac ctcaatctgt 3 960 
ttgcatagaa gagttagcta ctcaatgctt 4 02 0 
tactttcagg cgggtctgta gtaaggagaa 4080 
acggactata gactatacta gtatactccg 4140 
cttgtccttt aacgaggcct taccactctt 4200 
gcagtgtgat ctaagattct atcttcgcga 4260 
actagaaatg caaaaggcac ttctacaatg 432 0 
gcattttttt tttttttttt tttttttttt 4380 
caaatatcat aaaaaaagag aatcttttta 4440 
agcatcaccg acttcggtgg tactgttgga 45 00 
atccaaaacc tttttaactg catcttcaat 4560 
caatttcaac atcattgcag cagacaagat 4 62 0 
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agtggcgata gggttgacct tattctttgg 
gtacaaacca aatgcggtgt tcttgtctgg 
acccaaggag cctgggataa cggaggcttc 
ggtgattata ataccattta ggtgggttgg 
aatcaattga tgttgaactt tcaatgtagg 
ttttctccat aatcttgaag aggccaaaac 
tggtggctca tgttgtaggg ccatgaaagc 
aacggtgtat tgttcactat cccaagcgac 
aaagtaaata cctcccacta attctctaac 
tggcttgatt ggagataagt ctaaaagaga 
ggcgtacaat tgaagttctt tacggatttt 
ggtaccccat ttaggaccac ccacagcacc 
ttccagcgcc tcatctggaa gtggaacacc 
atgattttcg aaatcgaact tgacattgga 
aatggcttcg gctgtgattt cttgaccaac 
aggggcagac attacaatgg tatatccttg 
aaaaaaaaaa atgcagcttc tcaatgatat 
tatccgacaa actgttttac agatttacga 
acatccgaac ctgggagttt tccctgaaac 
tatagtctag cgctttacgg aagacaatgt 
atctattgca taggtaatct tgcacgtcgc 
tgcacttcaa tagcatatct ttgttaacga 
atgcaacgcg agagcgctaa tttttcaaac 
gaaatgcaac gcgaaagcgc tattttacca 
caaaaatgca acgcgagagc gctaattttt 
gaacagaaat gcaacgcgag agcgctattt 
ttctacaaaa atgcatcccg agagcgctat 
tttctccttt gtgcgctcta taatgcagtc 
taaggttaga agaaggctac tttggtgtct 




caaatctgga gcggaaccat ggcatggttc 4680 
caaagaggcc aaggacgcag atggcaacaa 4740 
atcggagatg atatcaccaa acatgttgct 480 0 
gttcttaact aggatcatgg cggcagaatc 4860 
gaattcgttc ttgatggttt cctccacagt 4 92 0 
attagcttta tccaaggacc aaataggcaa 4980 
ggccattctt gtgattcttt gcacttctgg 5040 
accatcacca tcgtcttcct ttctcttacc 5100 
aacaacgaag tcagtacctt tagcaaattg 5160 
gtcggatgca aagttacatg gtcttaagtt 522 0 
tagtaaacct tgttcaggtc taacactacc 5280 
taacaaaacg gcatcagcct tcttggaggc 534 0 
tgtagcatcg atagcagcac caccaattaa 5400 
acgaacatca gaaatagctt taagaacctt 54 60 
gtggtcacct ggcaaaacga cgatcttctt 5520 
aaatatatat aaaaaaaaaa aaaaaaaaaa 5580 
tcgaatacgc tttgaggaga tacagcctaa 564 0 
tcgtacttgt tacccatcat tgaattttga 5700 
agatagtata tttgaacctg tataataata 5760 
atgtatttcg gttcctggag aaactattgc 5820 
atccccggtt cattttctgc gtttccatct 5880 
agcatctgtg cttcattttg tagaacaaaa 5 94 0 
aaagaatctg agctgcattt ttacagaaca 6000 
acgaagaatc tgtgcttcat ttttgtaaaa 6060 
caaacaaaga atctgagctg catttttaca 6120 
taccaacaaa gaatctatac ttcttttttg 6180 
ttttctaaca aagcatctta gattactttt 6240 
tcttgataac tttttgcact gtaggtccgt 63 00 
attttctctt ccataaaaaa agcctgactc 6360 
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cacttcccgc gtttactgat tactagcgaa 
atccccgatt atattctata ccgatgtgga 
gcgttgatga ttcttcattg gtcagaaaat 
atactacgta taggaaatgt ttacattttc 
tcttactaca atttttttgt ctaaagagta 
gtcgagttta gatgcaagtt caaggagcga 
agcacagaga tatatagcaa agagatactt 
aatattttag tagctcgtta cagtccggtg 
gagcgctttt ggttttcaaa agcgctctga 
tcggaatagg aacttcaaag cgtttccgaa 
tgcgcacata cagctcactg ttcacgtcgc 
tatacatgag aagaacggca tagtgcgtgt 
atttatgtag gatgaaaggt agtctagtac 
gtatcgtatg cttccttcag cactaccctt 
tggattagtc tcatccttca atgctatcat 
ccgagaaact agtgcgaagt agtgatcagg 
cctggccacg gcagaagcac gcttatcgct 
taggcccttc attgaaagaa atgaggtcat 
attttttata gcaaagattg aataaggcgc 
gactaagtta tcttttaata attggtattc 
atttactcgt tttaggactg gttcagaatt 
atcgatgata agctgtcaaa catgagaatt 
tatttttata ggttaatgtc atgataataa 
ggggaaatgt gcgcggaacc cctatttgtt 
cgctcatgag acaataaccc tgataaatgc 
gtattcaaca tttccgtgtc gcccttattc 
ttgctcaccc agaaacgctg gtgaaagtaa 
tgggttacat cgaactggat ctcaacagcg 
aacgttttcc aatgatgagc acttttaaag 



gctgcgggtg cattttttca agataaaggc 642 0 
ttgcgcatac tttgtgaaca gaaagtgata 64 80 
tatgaacggt ttcttctatt ttgtctctat 6540 
gtattgtttt cgattcactc tatgaatagt 6600 
atactagaga taaacataaa aaatgtagag 6660 
aaggtggatg ggtaggttat atagggatat 672 0 
ttgagcaatg tttgtggaag cggtattcgc 67 80 
cgtttttggt tttttgaaag tgcgtcttca 6840 
agttcctata ctttctagag aataggaact 6900 
aacgagcgct tccgaaaatg caacgcgagc 6960 
acctatatct gcgtgttgcc tgtatatata 7020 
ttatgcttaa atgcgtactt atatgcgtct 7080 
ctcctgtgat attatcccat tccatgcggg 7140 
tagctgttct atatgctgcc actcctcaat 7200 
ttcctttgat attggatcat atgcatagta 7260 
tattgctgtt atctgatgag tatacgttgt 7320 
ccaatttccc acaacattag tcaactccgt 7380 
caaatgtctt ccaatgtgag attttgggcc 7440 
atttttcttc aaagctttat tgtacgatct 7500 
ctgtttattg cttgaagaat tgccggtcct 7560 
cctcaaaaat tcatccaaat atacaagtgg 762 0 
cttgaagacg aaagggcctc gtgatacgcc 7680 
tggtttctta gacgtcaggt ggcacttttc 7740 
tatttttcta aatacattca aatatgtatc 7800 
ttcaataata ttgaaaaagg aagagtatga 7860 
ccttttttgc ggcattttgc cttcctgttt 7920 
aagatgctga agatcagttg ggtgcacgag 7 98 0 
gtaagatcct tgagagtttt cgccccgaag 8040 
ttctgctatg tggcgcggta ttatcccgtg 8100 
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ttgacgccgg gcaagagcaa ctcggtcgcc 
agtactcacc agtcacagaa aagcatctta 
gtgctgccat aaccatgagt gataacactg 
gaccgaagga gctaaccgct tttttgcaca 
gttgggaacc ggagctgaat gaagccatac 
cagcaatggc aacaacgttg cgcaaactat 
ggcaacaatt aatagactgg atggaggcgg 
cccttccggc tggctggttt attgctgata 
gtatcattgc agcactgggg ccagatggta 
cggggagtca ggcaactatg gatgaacgaa 
tgattaagca ttggtaactg tcagaccaag 
aacttcattt ttaatttaaa aggatctagg 
aaatccctta acgtgagttt tcgttccact 
gatcttcttg agatcctttt tttctgcgcg 
cgctaccagc ggtggtttgt ttgccggatc 
ctggcttcag cagagcgcag ataccaaata 
accacttcaa gaactctgta gcaccgccta 
tggctgctgc cagtggcgat aagtcgtgtc 
cggataaggc gcagcggtcg ggctgaacgg 
gaacgaccta caccgaactg agatacctac 
ccgaagggag aaaggcggac aggtatccgg 
cgagggagct tccaggggga aacgcctggt 
tctgacttga gcgtcgattt ttgtgatgct 
ccagcaacgc ggccttttta cggttcctgg 
ttcctgcgtt atcccctgat tctgtggata 
ccgctcgccg cagccgaacg accgagcgca 
gcctgatgcg gtattttctc cttacgcatc 
ctctcagtac aatctgctct gatgccgcat 
acgtgactgg gtcatggctg cgccccgaca 




gcatacacta ttctcagaat gacttggttg 8160 
cggatggcat gacagtaaga gaattatgca 822 0 
cggccaactt acttctgaca acgatcggag 82 80 
acatggggga tcatgtaact cgccttgatc 8340 
caaacgacga gcgtgacacc acgatgcctg 84 00 
taactggcga actacttact ctagcttccc 8460 
ataaagttgc aggaccactt ctgcgctcgg 852 0 
aatctggagc cggtgagcgt gggtctcgcg 85 80 
agccctcccg tatcgtagtt atctacacga 8640 
atagacagat cgctgagata ggtgcctcac 8700 
tttactcata tatactttag attgatttaa 8760 
tgaagatcct ttttgataat ctcatgacca 8820 
gagcgtcaga ccccgtagaa aagatcaaag 8880 
taatctgctg cttgcaaaca aaaaaaccac 8940 
aagagctacc aactcttttt ccgaaggtaa 9000 
ctgtccttct agtgtagccg tagttaggcc 9060 
catacctcgc tctgctaatc ctgttaccag 9120 
ttaccgggtt ggactcaaga cgatagttac 9180 
ggggttcgtg cacacagccc agcttggagc 924 0 
agcgtgagct atgagaaagc gccacgcttc 93 00 
taagcggcag ggtcggaaca ggagagcgca 9360 
atctttatag tcctgtcggg tttcgccacc 9420 
cgtcaggggg gcggagccta tggaaaaacg 94 80 
ccttttgctg gccttttgct cacatgttct 9540 
accgtattac cgcctttgag tgagctgata 9600 
gcgagtcagt gagcgaggaa gcggaagagc 9660 
tgtgcggtat ttcacaccgc atatggtgca 9720 
agttaagcca gtatacactc cgctatcgct 97 80 
cccgccaaca cccgctgacg cgccctgacg 9840 
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ggcttgtctg ctcccggcat ccgcttacag acaagctgtg accgtctccg ggagctgcat 9900 
gtgtcagagg ttttcaccgt catcaccgaa acgcgcgagg cagctgcggt aaagctcatc 9960 
agcgtggtcg tgaagcgatt cacagatgtc tgcctgttca tccgcgtcca gctcgttgag 1002 0 
tttctccaga agcgttaatg tctggcttct gataaagcgg gccatgttaa gggcggtttt 100 80 
ttcctgtttg gtcactgatg cctccgtgta agggggattt ctgttcatgg gggtaatgat 1014 0 
accgatgaaa cgagagagga tgctcacgat acgggttact gatgatgaac atgcccggtt 10200 
actggaacgt tgtgagggta aacaactggc ggtatggatg cggcgggacc agagaaaaat 102 6 0 
cactcagggt caatgccagc gcttcgttaa tacagatgta ggtgttccac agggtagcca 1032 0 
gcagcatcct gcgatgcaga tccggaacat aatggtgcag ggcgctgact tccgcgtttc 103 8 0 
cagactttac gaaacacgga aaccgaagac cattcatgtt gttgctcagg tcgcagacgt 1044 0 
tttgcagcag cagtcgcttc acgttcgctc gcgtatcggt gattcattct gctaaccagt 10500 
aaggcaaccc cgccagccta gccgggtcct caacgacagg agcacgatca tgcgcacccg 105 60 
tggccaggac ccaacgctgc ccgagatgcg ccgcgtgcgg ctgctggaga tggcggacgc 1062 0 
gatggatatg ttctgccaag ggttggtttg cgcattcaca gttctccgca agaattgatt 10680 
ggctccaatt cttggagtgg tgaatccgtt agcgaggtgc cgccggcttc cattcaggtc 1074 0 
gaggtggccc ggctccatgc accgcgacgc aacgcgggga ggcagacaag gtatagggcg 1080 0 
gcgcctacaa tccatgccaa cccgttccat gtgctcgccg aggcggcata aatcgccgtg 10860 
acgatcagcg gtccaatgat cgaagttagg ctggtaagag ccgcgagcga tccttgaagc 10 92 0 
tgtccctgat ggtcgtcatc tacctgcctg gacagcatgg cctgcaacgc gggcatcccg 10980 
atgccgccgg aagcgagaag aatcataatg gggaaggcca tccagcctcg cgtcgcgaac 1104 0 
gccagcaaga cgtagcccag cgcgtcggcc gccatgccgg cgataatggc ctgcttctcg 11100 
ccgaaacgtt tggtggcggg accagtgacg aaggcttgag cgagggcgtg caagattccg 11160 
aataccgcaa gcgacaggcc gatcatcgtc gcgctccagc gaaagcggtc ctcgccgaaa 1122 0 
atgacccaga gcgctgccgg cacctgtcct acgagttgca tgataaagaa gacagtcata 112 8 0 
agtgcggcga cgatagtcat gccccgcgcc caccggaagg agctgactgg gttgaaggct 1134 0 
ctcaagggca tcggtcgagg atccttcaat atgcgcacat acgctgttat gttcaaggtc 11400 
ccttcgttta agaacgaaag cggtcttcct tttgagggat gtttcaagtt gttcaaatct 11460 
atcaaatttg caaatcccca gtctgtatct agagcgttga atcggtgatg cgatttgtta 11520 
attaaattga tggtgtcacc attaccaggt ctagatatac caatggcaaa ctgagcacaa 11580 
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caataccagt ccggatcaac tggcaccatc tctcccgtag tctcatctaa tttttcttcc 11640 

ggatgaggtt ccagatatac cgcaacacct ttattatggt ttccctgagg gaataataga 11700 

atgtcccatt cgaaatcacc aattctaaac ctgggcgaat tgtatttcgg gtttgttaac 11760 

tcgttccagt caggaatgtt ccacgtgaag ctatcttcca gcaaagtctc cacttcttca 1182 0 

tcaaattgtg gagaatactc ccaatgctct tatctatggg acttccggga aacacagtac 11880 

cgatacttcc caattcgtct tcagagctca ttgtttgttt gaagagacta atcaaagaat 11940 

cgttttctca aaaaaattaa tatcttaact gatagtttga tcaaaggggc aaaacgtagg 12000 

ggcaaacaaa cggaaaaatc gtttctcaaa ttttctgatg ccaagaactc taaccagtct 12 06 0 

tatctaaaaa ttgccttatg atccgtctct ccggttacag cctgtgtaac tgattaatcc 1212 0 

tgcctttcta atcaccattc taatgtttta attaagggat tttgtcttca ttaacggctt 12180 

tcgctcataa aaatgttatg acgttttgcc cgcaggcggg aaaccatcca cttcacgaga 12240 

ctgatctcct ctgccggaac accgggcatc tccaacttat aagttggaga aataagagaa 123 00 

tttcagattg agagaatgaa aaaaaaaaac ccttagttca taggtccatt ctcttagcgc 123 60 

aactacagag aacaggggca caaacaggca aaaaacgggc acaacctcaa tggagtgatg 124 2 0 

caacctgcct ggagtaaatg atgacacaag gcaattgacc cacgcatgta tctatctcat 124 8 0 

tttcttacac cttctattac cttctgctct ctctgatttg gaaaaagctg aaaaaaaagg 12540 

ttgaaaccag ttccctgaaa ttattcccct acttgactaa taagtatata aagacggtag 12600 

gtattgattg taattctgta aatctatttc ttaaacttct taaattctac ttttatagtt 12660 

agtctttttt ttagttttaa aacaccaaga acttagtttc gaataaacac acataaacaa 1272 0 

acaagcttac aaaacaaatt cacc atg get gca tat gca get cag ggc tat 12771 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr 
1 5 

aag gtg eta gta etc aac ccc tct gtt get gca aca ctg ggc ttt ggt 12819 
Lys Val Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly 
10 15 20 25 

get tac atg tec aag get cat ggg ate gat cct aac ate agg acc ggg 12 867 
Ala Tyr Met Ser Lys Ala His Gly lie Asp Pro Asn lie Arg Thr Gly 
30 35 40 

gtg aga aca att acc act ggc age ccc ate acg tac tec acc tac ggc 12915 
Val Arg Thr lie Thr Thr Gly Ser Pro lie Thr Tyr Ser Thr Tyr Gly 
45 50 55 

aag ttc ctt gee gac ggc ggg tgc teg ggg ggc get tat gac ata ata 12 963 
Lys Phe Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp He He 

60 65 70 
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att tgt gac gag tgc cac tec acg gat gec aca tec ate ttg ggc att 13011 
lie Cys Asp Glu Cys His Ser Thr Asp Ala Thr Ser lie Leu Gly lie 
75 80 85 

ggc act gtc ctt gac caa gca gag act gcg ggg gcg aga ctg gtt gtg 13 05 9 
Gly Thr Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val 
90 95 100 105 

etc gee acc gee acc cct ccg ggc tec gtc act gtg ccc cat ccc aac 13107 
Leu Ala Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn 
110 115 120 

ate gag gag gtt get ctg tec acc acc gga gag ate cct ttt tac ggc 13155 
He Glu Glu Val Ala Leu Ser Thr Thr Gly Glu He Pro Phe Tyr Gly 
125 130 135 

aag get ate ccc etc gaa gta ate aag ggg ggg aga cat etc ate ttc 132 03 
Lys Ala He Pro Leu Glu Val He Lys Gly Gly Arg His Leu He Phe 
140 145 150 

tgt cat tea aag aag aag tgc gac gaa etc gee gca aag ctg gtc gca 132 51 
Cys His Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala 
155 160 165 

ttg ggc ate aat gec gtg gec tac tac cgc ggt ctt gac gtg tec gtc 13299 
Leu Gly He Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val 
170 175 180 185 

ate ccg acc age ggc gat gtt gtc gtc gtg gca acc gat gee etc atg 13347 
He Pro Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met 
190 195 200 

acc ggc tat acc ggc gac ttc gac teg gtg ata gac tgc aat acg tgt 133 95 
Thr Gly Tyr Thr Gly Asp Phe Asp Ser Val He Asp Cys Asn Thr Cys 
205 210 215 

gtc acc cag aca gtc gat ttc age ctt gac cct acc ttc acc att gag 13443 
Val Thr Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr He Glu 
220 225 230 

aca ate acg etc ccc caa gat get gtc tec cgc act caa cgt egg ggc 13491 
Thr He Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly 
235 240 245 

agg act ggc agg ggg aag cca ggc ate tac aga ttt gtg gca ccg ggg 13 53 9 
Arg Thr Gly Arg Gly Lys Pro Gly He Tyr Arg Phe Val Ala Pro Gly 
250 255 260 265 

gag cgc ccc tec ggc atg ttc gac teg tec gtc etc tgt gag tgc tat 13587 
Glu Arg Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr 
270 275 280 

gac gca ggc tgt get tgg tat gag etc acg ccc gee gag act aca gtt 13 635 
Asp Ala Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val 
285 290 295 

agg eta cga gcg tac atg aac acc ccg ggg ctt ccc gtg tgc cag gac 13683 
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Arg Leu Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp 
300 305 310 

cat ctt gaa ttt tgg gag ggc gtc ttt aca ggc etc act cat ata gat 13731 
His Leu Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His lie Asp 
315 320 325 

gec cac ttt eta tec cag aca aag cag agt ggg gag aac ctt cct tac 13779 
Ala His Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr 
330 335 340 345 

ctg gta gcg tac caa gec acc gtg tgc get agg get caa gee cct ccc 13 82 7 
Leu Val Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro 
350 355 360 

cca teg tgg gac cag atg tgg aag tgt ttg att cgc etc aag ccc acc 13 875 
Pro Ser Trp Asp Gin Met Trp Lys Cys Leu lie Arg Leu Lys Pro Thr 
365 370 375 

etc cat ggg cca aca ccc ctg eta tac aga ctg ggc get gtt cag aat 13923 
Leu His Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn 
380 385 390 

gaa ate acc ctg acg cac cca gtc acc aaa tac ate atg aca tgc atg 13971 
Glu lie Thr Leu Thr His Pro Val Thr Lys Tyr lie Met Thr Cys Met 
395 400 405 

teg gee gac ctg gag gtc gtc acg age acc tgg gtg etc gtt ggc ggc 14019 
Ser Ala Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val Gly Gly 
410 415 420 425 

gtc ctg get get ttg gec gcg tat tgc ctg tea aca ggc tgc gtg gtc 14067 
Val Leu Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val 
430 435 440 

ata gtg ggc agg gtc gtc ttg tec ggg aag ccg gca ate ata cct gac 14115 
lie Val Gly Arg Val Val Leu Ser Gly Lys Pro Ala lie lie Pro Asp 
445 450 455 

agg gaa gtc etc tac cga gag ttc gat gag atg gaa gag tgc tct cag 14163 
Arg Glu Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin 
460 465 470 

cac tta ccg tac ate gag caa ggg atg atg etc gee gag cag ttc aag 14211 
His Leu Pro Tyr lie Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys 
475 480 485 

cag aag gec etc ggc etc ctg cag acc gcg tec cgt cag gca gag gtt 14259 
Gin Lys Ala Leu Gly Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val 
490 495 500 505 

ate gee cct get gtc cag acc aac tgg caa aaa etc gag acc ttc tgg 14307 
lie Ala Pro Ala Val Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp 
510 515 520 

gcg aag cat atg tgg aac ttc ate agt ggg ata caa tac ttg gcg ggc 14355 
Ala Lys His Met Trp Asn Phe lie Ser Gly lie Gin Tyr Leu Ala Gly 
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525 530 535 

ttg tea acg ctg cct ggt aac ccc gec att get tea ttg atg get ttt 14403 
Leu Ser Thr Leu Pro Gly Asn Pro Ala lie Ala Ser Leu Met Ala Phe 
540 545 550 

aca get get gtc ace age cca eta ace act age caa ace etc etc ttc 14451 
Thr Ala Ala Val Thr Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe 
555 560 565 

aac ata ttg ggg ggg tgg gtg get gee cag etc gee gee ccc ggt gee 14499 
Asn lie Leu Gly Gly Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala 
570 575 580 585 

get act gee ttt gtg ggc get ggc tta get ggc gee gee ate ggc agt 14547 
Ala Thr Ala Phe Val Gly Ala Gly Leu Ala Gly Ala Ala He Gly Ser 
590 595 600 

gtt gga ctg ggg aag gtc etc ata gac ate ctt gca ggg tat ggc gcg 14595 
Val Gly Leu Gly Lys Val Leu He Asp He Leu Ala Gly Tyr Gly Ala 
605 610 615 

ggc gtg gcg gga get ctt gtg gca ttc aag ate atg age ggt gag gtc 14643 
Gly Val Ala Gly Ala Leu Val Ala Phe Lys He Met Ser Gly Glu Val 
620 625 630 

ccc tec acg gag gac ctg gtc aat eta ctg ccc gec ate etc teg ccc 14691 
Pro Ser Thr Glu Asp Leu Val Asn Leu Leu Pro Ala He Leu Ser Pro 
635 640 645 

gga gec etc gta gtc ggc gtg gtc tgt gca gca ata ctg cgc egg cac 14 73 9 
Gly Ala Leu Val Val Gly Val Val Cys Ala Ala He Leu Arg Arg His 
650 655 660 665 

gtt ggc ccg ggc gag ggg gca gtg cag tgg atg aac egg ctg ata gee 147 87 
Val Gly Pro Gly Glu Gly Ala Val Gin Trp Met Asn Arg Leu He Ala 
670 675 680 

ttc gee tec egg ggg aac cat gtt tec ccc acg cac tac gtg ccg gag 14835 
Phe Ala Ser Arg Gly Asn His Val Ser Pro Thr His Tyr Val Pro Glu 
685 690 695 

age gat gca get gec cgc gtc act gee ata etc age age etc act gta 14883 
Ser Asp Ala Ala Ala Arg Val Thr Ala He Leu Ser Ser Leu Thr Val 
700 705 710 

acc cag etc ctg agg cga ctg cac cag tgg ata age teg gag tgt ace 14 931 
Thr Gin Leu Leu Arg Arg Leu His Gin Trp He Ser Ser Glu Cys Thr 
715 720 725 

act cca tgc tec ggt tec tgg eta agg gac ate tgg gac tgg ata tgc 1497 9 
Thr Pro Cys Ser Gly Ser Trp Leu Arg Asp He Trp Asp Trp He Cys 
730 735 740 745 

gag gtg ttg age gac ttt aag acc tgg eta aaa get aag etc atg cca 15027 
Glu Val Leu Ser Asp Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro 
750 755 760 
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cag ctg cct ggg ate ccc ttt gtg tec tgc cag cgc ggg tat aag ggg 15075 
Gin Leu Pro Gly lie Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly 

765 770 775 

gtc tgg cga ggg gac ggc ate atg cac act cgc tgc cac tgt gga get 15123 
Val Trp Arg Gly Asp Gly lie Met His Thr Arg Cys His Cys Gly Ala 
780 785 790 

gag ate act gga cat gtc aaa aac ggg acg atg agg ate gtc ggt cct 15171 
Glu lie Thr Gly His Val Lys Asn Gly Thr Met Arg lie Val Gly Pro 
795 800 805 

agg acc tgc agg aac atg tgg agt ggg ace ttc ccc att aat gee tac 15219 
Arg Thr Cys Arg Asn Met Trp Ser Gly Thr Phe Pro lie Asn Ala Tyr 
810 815 820 825 

acc acg ggc ccc tgt acc ccc ctt cct gcg ccg aac tac acg ttc gcg 15267 
Thr Thr Gly Pro Cys Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala 
830 835 840 

eta tgg agg gtg tct gca gag gaa tac gtg gag ata agg cag gtg ggg 15315 
Leu Trp Arg Val Ser Ala Glu Glu Tyr Val Glu lie Arg Gin Val Gly 
845 850 855 

gac ttc cac tac gtg acg ggt atg act act gac aat ctt aaa tgc ccg 15363 
Asp Phe His Tyr Val Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro 
860 865 870 

tgc cag gtc cca teg ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc 15411 
Cys Gin Val Pro Ser Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg 
875 880 885 

eta cat agg ttt gcg ccc ccc tgc aag ccc ttg ctg egg gag gag gta 15459 
Leu His Arg Phe Ala Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val 
890 895 900 905 

tea ttc aga gta gga etc cac gaa tac ccg gta ggg teg caa tta cct 15507 
Ser Phe Arg Val Gly Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro 
910 915 920 

tgc gag ccc gaa ccg gac gtg gee gtg ttg acg tec atg etc act gat 15555 
Cys Glu Pro Glu Pro Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp 
925 930 935 

ccc tec cat ata aca gca gag gcg gee ggg cga agg ttg gcg agg gga 15 603 
Pro Ser His lie Thr Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly 
940 945 950 

tea ccc ccc tct gtg gee age tec teg get age cag eta tec get cca 15651 
Ser Pro Pro Ser Val Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro 
955 960 965 

tct etc aag gca act tgc acc get aac cat gac tec cct gat get gag 15699 
Ser Leu Lys Ala Thr Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu 
970 975 980 985 

etc ata gag gee aac etc eta tgg agg cag gag atg ggc ggc aac ate 15747 
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Leu lie Glu Ala Asn Leu Leu Trp Arg Gin Glu Met Gly Gly Asn lie 
990 995 1000 

acc agg gtt gag tea gaa aac aaa gtg gtg att ctg gac tec ttc gat 15795 
Thr Arg Val Glu Ser Glu Asn Lys Val Val lie Leu Asp Ser Phe Asp 
1005 1010 1015 

ccg ctt gtg gcg gag gag gac gag egg gag ate tec gta ccc gca gaa 15843 
Pro Leu Val Ala Glu Glu Asp Glu Arg Glu lie Ser Val Pro Ala Glu 
1020 1025 1030 

ate ctg egg aag tct egg aga ttc gec cag gec ctg ccc gtt tgg gcg 15891 
lie Leu Arg Lys Ser Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala 

1035 1040 1045 

egg ccg gac tat aac ccc ccg eta gtg gag acg tgg aaa aag ccc gac 1593 9 
Arg Pro Asp Tyr Asn Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp 
1050 1055 1060 1065 

tac gaa cca cct gtg gtc cat ggc tgc ccg ctt cca cct cca aag tec 15987 
Tyr Glu Pro Pro Val Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser 
1070 1075 1080 

cct cct gtg cct ccg cct egg aag aag egg acg gtg gtc etc act gaa 16035 
Pro Pro Val Pro Pro Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu 
1085 1090 1095 

tea acc eta tct act gec ttg gee gag etc gec acc aga age ttt ggc 16083 
Ser Thr Leu Ser Thr Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly 
1100 1105 1110 

age tec tea act tec ggc att acg ggc gac aat acg aca aca tec tct 16131 
Ser Ser Ser Thr Ser Gly lie Thr Gly Asp Asn Thr Thr Thr Ser Ser 
1115 1120 1125 

gag ccc gec cct tct ggc tgc ccc ccc gac tec gac get gag tec tat 16179 
Glu Pro Ala Pro Ser Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr 
1130 1135 1140 1145 

tec tec atg ccc ccc ctg gag ggg gag cct ggg gat ccg gat ctt age 16227 
Ser Ser Met Pro Pro Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser 
1150 1155 1160 

gac ggg tea tgg tea acg gtc agt agt gag gee aac gcg gag gat gtc 16275 
Asp Gly Ser Trp Ser Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val 
1165 1170 1175 

gtg tgc tgc tea atg tct tac tct tgg aca ggc gca etc gtc acc ccg 16323 
Val Cys Cys Ser Met Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro 
1180 1185 1190 

tgc gee gcg gaa gaa cag aaa ctg ccc ate aat gca eta age aac teg 16371 
Cys Ala Ala Glu Glu Gin Lys Leu Pro lie Asn Ala Leu Ser Asn Ser 
1195 1200 1205 

ttg eta cgt cac cac aat ttg gtg tat tec acc acc tea cgc agt get 16419 
Leu Leu Arg His His Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala 
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1210 1215 1220 1225 

tgc caa agg cag aag aaa gtc aca ttt gac aga ctg caa gtt ctg gac 16467 
Cys Gin Arg Gin Lys Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp 
1230 1235 1240 

age cat tac cag gac gta etc aag gag gtt aaa gca gcg gcg tea aaa 16515 
Ser His Tyr Gin Asp Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys 
1245 1250 1255 

gtg aag get aac ttg eta tec gta gag gaa get tgc age ctg acg ccc 16563 
Val Lys Ala Asn Leu Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro 
1260 1265 1270 

cca cac tea gec aaa tec aag ttt ggt tat ggg gca aaa gac gtc cgt 16611 
Pro His Ser Ala Lys Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg 
1275 1280 1285 

tgc cat gec aga aag gee gta acc cac ate aac tec gtg tgg aaa gac 16659 
Cys His Ala Arg Lys Ala Val Thr His lie Asn Ser Val Trp Lys Asp 
1290 1295 1300 1305 

ctt ctg gaa gac aat gta aca cca ata gac act acc ate atg get aag 16707 
Leu Leu Glu Asp Asn Val Thr Pro lie Asp Thr Thr lie Met Ala Lys 
1310 1315 1320 

aac gag gtt ttc tgc gtt cag cct gag aag ggg ggt cgt aag cca get 16755 
Asn Glu Val Phe Cys Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala 
1325 1330 1335 

cgt etc ate gtg ttc ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg 16803 
Arg Leu lie Val Phe Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met 
1340 1345 1350 

get ttg tac gac gtg gtt aca aag etc ccc ttg gee gtg atg gga age 16851 
Ala Leu Tyr Asp Val Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser 
1355 1360 1365 

tec tac gga ttc caa tac tea cca gga cag egg gtt gaa ttc etc gtg 16899 
Ser Tyr Gly Phe Gin Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val 
1370 1375 1380 1385 

caa gcg tgg aag tec aag aaa acc cca atg ggg ttc teg tat gat acc 16947 
Gin Ala Trp Lys Ser Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr 
1390 1395 1400 

cgc tgc ttt gac tec aca gtc act gag age gac ate cgt acg gag gag 16995 
Arg Cys Phe Asp Ser Thr Val Thr Glu Ser Asp lie Arg Thr Glu Glu 
1405 1410 1415 

gca ate tac caa tgt tgt gac etc gac ccc caa gee cgc gtg gee ate 17043 
Ala lie Tyr Gin Cys Cys Asp Leu Asp Pro Gin Ala Arg Val Ala lie 
1420 1425 1430 

aag tec etc acc gag agg ctt tat gtt ggg ggc cct ctt acc aat tea 17091 
Lys Ser Leu Thr Glu Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser 
1435 1440 1445 
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&99 999 9^9 aac tgc ggc tat cgc agg tgc cgc gcg age ggc gta ctg 1713 9 
Arg Gly Glu Asn Cys Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu 
1450 1455 1460 1465 

aca act age tgt ggt aac acc etc act tgc tac ate aag gec egg gca 17187 
Thr Thr Ser Cys Gly Asn Thr Leu Thr Cys Tyr lie Lys Ala Arg Ala 
1470 1475 1480 

gec tgt cga gec gca ggg etc cag gac tgc acc atg etc gtg tgt ggc 1723 5 
Ala Cys Arg Ala Ala Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly 
1485 1490 1495 

gac gac tta gtc gtt ate tgt gaa age gcg ggg gtc cag gag gac gcg 172 83 
Asp Asp Leu Val Val lie Cys Glu Ser Ala Gly Val Gin Glu Asp Ala 
1500 1505 1510 

gcg age ctg aga gee ttc acg gag get atg acc agg tac tec gee ccc 17331 
Ala Ser Leu Arg Ala Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro 
1515 1520 1525 

cct ggg gac ccc cca caa cca gaa tac gac ttg gag etc ata aca tea 17379 
Pro Gly Asp Pro Pro Gin Pro Glu Tyr Asp Leu Glu Leu lie Thr Ser 
1530 1535 1540 1545 

tgc tec tec aac gtg tea gtc gee cac gac ggc get gga aag agg gtc 1742 7 
Cys Ser Ser Asn Val Ser Val Ala His Asp Gly Ala Gly Lys Arg Val 
1550 1555 1560 

tac tac etc acc cgt gac cct aca acc ccc etc gcg aga get gcg tgg 17475 
Tyr Tyr Leu Thr Arg Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp 
1565 1570 1575 

gag aca gca aga cac act cca gtc aat tec tgg eta ggc aac ata ate 17523 
Glu Thr Ala Arg His Thr Pro Val Asn Ser Trp Leu Gly Asn lie lie 
1580 1585 1590 

atg ttt gee ccc aca ctg tgg gcg agg atg ata ctg atg acc cat ttc 17571 
Met Phe Ala Pro Thr Leu Trp Ala Arg Met lie Leu Met Thr His Phe 
1595 1600 1605 

ttt age gtc ctt ata gec agg gac cag ctt gaa cag gec etc gat tgc 17619 
Phe Ser Val Leu lie Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys 
1610 1615 1620 1625 

gag ate tac ggg gec tgc tac tec ata gaa cca ctg gat eta cct cca 17667 
Glu lie Tyr Gly Ala Cys Tyr Ser lie Glu Pro Leu Asp Leu Pro Pro 
1630 1635 1640 

ate att caa aga etc cat ggc etc age gca ttt tea etc cac agt tac 17715 
lie lie Gin Arg Leu His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr 
1645 1650 1655 

tct cca ggt gaa ate aat agg gtg gec gca tgc etc aga aaa ctt ggg 17763 
Ser Pro Gly Glu lie Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly 
1660 1665 1670 

gta ccg ccc ttg cga get tgg aga cac egg gec egg age gtc cgc get 17811 
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Val Pro Pro Leu Arg Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala 
1675 1680 1685 

agg ctt ctg gcc aga gga ggc agg get gec ata tgt ggc aag tac etc 17859 
Arg Leu Leu Ala Arg Gly Gly Arg Ala Ala lie Cys Gly Lys Tyr Leu 
1690 1695 1700 1705 

ttc aac tgg gca gta aga aca aag etc aaa etc act cca ata gcg gcc 17907 
Phe Asn Trp Ala Val Arg Thr Lys Leu Lys Leu Thr Pro lie Ala Ala 
1710 1715 1720 

get ggc cag ctg gac ttg tec ggc tgg ttc acg get ggc tac age ggg 17 955 
Ala Gly Gin Leu Asp Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly 
1725 1730 1735 

gga gac att tat cac age gtg tct cat gcc egg ccc cgc tgg ate tgg 18003 
Gly Asp lie Tyr His Ser Val Ser His Ala Arg Pro Arg Trp lie Trp 
1740 1745 1750 

ttt tgc eta etc ctg ctt get gca ggg gta ggc ate tac etc etc ccc 18051 
Phe Cys Leu Leu Leu Leu Ala Ala Gly Val Gly lie Tyr Leu Leu Pro 
1755 1760 1765 

aac cga tgaaggttgg ggtaaacact ccggcctaaa aaaaaaaaaa aatctagaac 18107 

Asn Arg 

1770 

ccgagtcgac tttgttccca ctgtactttt agetegtaca aaatacaata tacttttcat 18167 
ttctccgtaa acaacatgtt ttcccatgta atatcctttt ctatttttcg ttccgttacc 18227 
aactttacac atactttata tagctattca cttctataca ctaaaaaact aagacaattt 18287 
taattttget gcctgccata tttcaatttg ttataaattc ctataattta tcctattagt 18347 
agctaaaaaa agatgaatgt gaatcgaatc ctaagagaat tggatctgat ccacaggacg 18407 
ggtgtggtcg ccatgatcgc gtagtcgata gtggctccaa gtagegaage gagcaggact 184 67 
gggeggegge caaageggtc ggacagtgct ccgagaaegg gtgegcatag aaattgeate 18527 
aacgeatata gcgctagcag cacgccatag tgactggcga tgctgtcgga atggacgata 18587 
tcccgcaaga ggcccggcag taceggcata accaagccta tgcctacagc atccagggtg 18647 
aeggtgeega ggatgacgat gagegcattg ttagatttca tacacggtgc ctgactgcgt 187 07 
tagcaattta actgtgataa actaccgcat taaagctttt tctttccaat tttttttttt 18767 
tegtcattat aaaaatcatt acgaccgaga ttcccgggta ataactgata taattaaatt 18827 
gaagctctaa tttgtgagtt tagtatacat gcatttactt ataatacagt tttttagttt 18887 
tgctggccgc atcttctcaa atatgettec cagcctgctt ttctgtaacg ttcaccctct 18947 
accttagcat cccttccctt tgcaaatagt cctcttccaa caataataat gtcagatcct 19007 
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gtagagacca 


catcatccac 


ggttctatac 


tgttgaccca 


atgcgtctcc 


cttgtcatct 


19067 


aaacccacac 


cgggtgtcat 


aatcaaccaa 


tcgtaacctt 


catctcttcc 


acccatgtct 


19127 


ctttgagcaa 


taaagccgat 


aacaaaatct 


ttgtcgctct 


tcgcaatgtc 


aacagtaccc 


19187 


ttagtatatt 


ctccagtaga 


tagggagccc 


ttgcatgaca 


attctgctaa 


catcaaaagg 


19247 


cctctaggtt 


cctttgttac 


ttcttctgcc 


gcctgcttca 


aaccgctaac 


aatacctggg 


19307 


cccaccacac 


cgtgtgcatt 


cgtaatgtct 


gcccattctg 


ctattctgta 


tacacccgca 


19367 


gagtactgca 


atttgactgt 


attaccaatg 


tcagcaaatt 


ttctgtcttc 


gaagagtaaa 


19427 


aaattgtact 


tggcggataa 


tgcctttagc 


ggcttaactg 


tgccctccat 


ggaaaaatca 


19487 


gtcaagatat 


ccacatgtgt 


ttttagtaaa 


caaattttgg 


gacctaatgc 


ttcaactaac 


19547 


tccagtaatt 


ccttggtggt 


acgaacatcc 


aatgaagcac 


acaagtttgt 


ttgcttttcg 


19607 


tgcatgatat 


taaatagctt 


ggcagcaaca 


ggactaggat 


gagtagcagc 


acgttcctta 


19667 


tatgtagctt 


tcgacatgat 


ttatcttcgt 


ttcctgcagg 


tttttgttct 


gtgcagttgg 


19727 


gttaagaata 


ctgggcaatt 


tcatgtttct 


tcaacactac 


atatgcgtat 


atataccaat 


19787 


ctaagtctgt 


gctccttcct 


tcgttcttcc 


ttctgttcgg 


agattaccga 


atcaaaaaaa 


19847 


tttcaaggaa 


accgaaatca 


aaaaaaagaa 


taaaaaaaaa 


atgatgaatt 


gaaaagctta 


19907 



tcgat 19912 

<210> 9 
<211> 1771 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: pd . deltaNS3NS5 
<400> 9 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
15 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg Thr lie Thr Thr Gly 
35 40 45 

Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp lie lie lie Cys Asp Glu Cys His Ser 
65 70 75 80 
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Thr Asp Ala Thr Ser lie Leu Gly lie Gly Thr Val Leu Asp Gin Ala 

85 90 95 

Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn lie Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu lie Pro Phe Tyr Gly Lys Ala lie Pro Leu Glu Val 
130 135 140 

lie Lys Gly Gly Arg His Leu lie Phe Cys His Ser Lys Lys Lys Cys 
145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly lie Asn Ala Val Ala 
165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val lie Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val lie Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr He Glu Thr He Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 
245 250 255 

Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 
260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 

Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu He Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 
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Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu lie Thr Leu Thr His Pro 
385 390 395 400 

Val Thr Lys Tyr lie Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala He He Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 

515 520 525 

He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He Leu Gly Gly Trp Val 
565 570 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 
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Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala lie Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 

His Gin Trp lie Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp lie Trp Asp Trp lie Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly lie Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly lie 
770 775 780 

Met His Thr Arg Cys His Cys Gly Ala Glu lie Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg lie Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro lie Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 

Glu Tyr Val Glu lie Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 

Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His lie Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie Glu Ala Asn Leu Leu 
980 985 990 
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Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu lie Ser Val Pro Ala Glu lie Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly lie 
105 1110 1115 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 

Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 
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Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu lie Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val lie Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn lie lie Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 
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Ala Arg Met lie Leu Met Thr His Phe Phe Ser Val Leu lie Ala Arg 
1605 1610 1615 

Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser lie Glu Pro Leu Asp Leu Pro Pro lie lie Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu lie Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 1680 

Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 

Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp lie Tyr His Ser Val 
1730 1735 1740 

Ser His Ala Arg Pro Arg Trp lie Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 

Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg 
1765 1770 



<210> 10 
<211> 19798 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd.deltaNS3NS5.pj 

<220> 
<221> CDS 

<222> (12679) . . (17991) 
<400> 10 

atcgatccta ccccttgcgc taaagaagta tatgtgccta ctaacgcttg tctttgtctc 60 
tgtcactaaa cactggatta ttactcccag atacttattt tggactaatt taaatgattt 12 0 
cggatcaacg ttcttaatat cgctgaatct tccacaattg atgaaagtag ctaggaagag 18 0 
gaattggtat aaagtttttg tttttgtaaa tctcgaagta tactcaaacg aatttagtat 240 
tttctcagtg atctcccaga tgctttcacc ctcacttaga agtgctttaa gcattttttt 300 
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actgtggcta tttcccttat ctgcttcttc 
tacaatatca gtgatatcag attgatgttt 
ccaagcagga atcaatttct ttaatgaggc 
tttaaactgg agtgatttat tgacaatatc 
atagctcatg aatgtggctc tcttgattgc 
ataggttagt tcagcagcac ataatgctat 
cacaaactga cgaacaagca ccttaggtgg 
gcttagcgcc gatcttgtgt gcaattgata 
tcttgcagta ttcaaacacg ctaactcgaa 
cgttctttcg aaaaatgcac cggccgcgca 
cggtatcttc atttcatatt ttaaaaatgc 
ggcccgcagg ttcgttttgc ggtactatct 
tgcacagatt ttataatgta ataagcaaga 
agaaaaccaa aatggacgac attgaaacag 
cttatagcgt ctgggatgta tgtcggctgt 
ttgatataga gagtaaacgt aagtctgatg 
ccatggaatc tctcacaacc ggtaggccgt 
gcgtatcttc tgactccagt gctgaggtaa 
ggtttgattc gattggaaat ggtatgctct 
atttgatgct acagaataac aagctgttag 
ctataataat aggaagattg cccgagaaag 
gaaaaatgga ttgtacacag ttattagtcc 
agctcgtaag cgtcgttacc caattgctta 
taataggtga tttattcatc ccggaatctc 
tggcggcaga gaatcgttta cagcaaaaaa 
accatgctaa tacaaatgaa gaagttccct 
caagaggagc atataaatta caaaacacca 
aaaaaaggag agtagcaacg agggtaaggg 
gatccaatat caaaggaaat gatagcattg 




cgatgattcg aactgtaatt gcaaactact 360 
ttgtccatag taaggaataa ttgtaaattc 420 
ttccagaatt gttgcttttt gcgtcttgta 480 
gaaactcagc gaattgctta tgatagtatt 540 
tgttccgtta tgtgtaatca tccaacataa 600 
tttctcacct gaaggtcttt caaacctttc 660 
tgttttacat aatatatcaa attgtggcat 720 
tctagtttca actactctat ttatcttgta 780 
aaactaactt taattgtcct gtttgtctcg 840 
ttatttgtac tgcgaaaata attggtactg 900 
acctttgctg cttttcctta atttttagac 960 
tgtgataaaa agttgttttg acatgtgatc 102 0 
atacattatc aaacgaacaa tactggtaaa 1080 
ccaagaatct gacggtaaaa gcacgtacag 1140 
ttattgaaat gattgctcct gatgtagata 1200 
agctactctt tccaggatat gtcataaggc 1260 
atggtcttga ttctagcgca gaagattcca 1320 
ttttgcctgc tgcgaagatg gttaaggaaa 13 80 
cttcacaaga agcaagtcag gctgccatag 1440 
acaatagaaa gcaactatac aaatctattg 1500 
acaagaagag agctaccgaa atgctcatga 1560 
caccagctcc aacggaagaa gatgttatga 162 0 
ctttagttcc accagatcgt caagctgctt 1680 
taaaggatat attcaatagt ttcaatgaac 174 0 
agagtgagtt ggaaggaagg actgaagtga 1800 
ccaggcgaac aagaagtaga gacacaaatg 1860 
tcactgaggg ccctaaagcg gttcccacga 1920 
gcagaaaatc acgtaatact tctagggtat 1980 
aaggatgaga ctaatccaat tgaggagtgg 2 04 0 
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cagcatatag aacagctaaa gggtagtgct 
gggataatat cacaggaggt actagactac 
gtacgcattt aagcataaac acgcactatg 
caacacgcag atataggtgc gacgtgaaca 
ttttcggaag cgctcgtttt cggaaacgct 
ctagaaagta taggaacttc agagcgcttt 
ttcaaaaaac caaaaacgca ccggactgta 
tccacaaaca ttgctcaaaa gtatctcttt 
aacctaccca tccacctttc gctccttgaa 
aggcttccaa tgctcttcaa attttactgt 
ctcttcataa tgtaagctta tctttatcga 
ctttacggtt ccctgagatt gaattagttc 
ctttgtacga cgaattttga ggttcgccat 
tattatctcc gcctcagttt gatcttccgc 
tatttcaccc cacaatcctt catccgcctc 
atgttgtaca ttgtttagtt cacgagaagg 
tatatgacct ttatcctgtt ctctttccac 
gcacctaata acattcttca aggcggagaa 
tgaaaacgtg agaatgaatt tagtattatt 
tcgaagataa gagaagaatg cagtgacctt 
aaaaaatacg cctttaggcc ttctgatacc 
attaatatct aaaccctctc cgatggtggc 
aaactgtgat aattctgggt gatttatgat 
aggatcaggc caatccagtt ctttttcaat 
tccaacaaat gcaaatgcta acgttttgta 
cccccttgtc gtctcgatta cacacctact 
cataatacat tgcttaatac aagcaagcag 
cattacagct gatgtcattg tatatcagcg 
tcgcggtttt tataaacaaa actttcgtta 



gaaggaagca tacgataccc cgcatggaat 2100 
ctttcatcct acataaatag acgcatataa 2160 
ccgttcttct catgtatata tatatacagg 2220 
gtgagctgta tgtgcgcagc tcgcgttgca 22 8 0 
ttgaagttcc tattccgaag ttcctattct 2340 
tgaaaaccaa aagcgctctg aagacgcact 24 00 
acgagctact aaaatattgc gaataccgct 2460 
gctatatatc tctgtgctat atccctatat 2520 
cttgcatcta aactcgacct ctacatcaac 2580 
caagtagacc catacggctg taatatgctg 2640 
atcgtgtgaa aaactactac cgcgataaac 2700 
ctttagtata tgatacaaga cacttttgaa 2760 
cctctggcta tttccaatta tcctgtcggc 2 82 0 
ttcagactgc catttttcac ataatgaatc 2 880 
cgcatcttgt tccgttaaac tattgacttc 2940 
gtcctcttca ggcggtagct cctgatctcc 3 000 
aaacttagaa atgtattcat gaattatgga 3 060 
gtttgggcca gatgcccaat atgcttgaca 312 0 
gtgatattct gaggcaattt tattataatc 3180 
tgtattgaca aatggagatt ccatgtatct 3240 
ctttcccctg cggtttagcg tgccttttac 3300 
ctttaactga ctaataaatg caaccgatat 3360 
tcgatcgaca attgtattgt acactagtgc 3420 
taccggtgtg tcgtctgtat tcagtacatg 34 80 
tttcttataa ttgtcaggaa ctggaaaagt 3540 
ttcatcgtac accataggtt ggaagtgctg 3 600 
tctctcgcca ttcatatttc agttattttc 3660 
ctgtaaaaat ctatctgtta cagaaggttt 3720 
cgaaatcgag caatcacccc agctgcgtat 3 78 0 



69 




ttggaaattc gggaaaaagt agagcaacgc 
ttaacttcga gaagggatta aggctaattt 
ccattgaatg ccttataaaa cagctataga 
tttgtcaaag cttactgatg atgatgtgtc 
tgacattata aagctggcac ttagaattcc 
tctactgtac gatacacttc cgctcaggtc 
ttgttactct attgatccag ctcagcaaag 
tgtagtaaaa ctagctagac cgagaaagag 
gctgccatca ttattatccg atgtgacgct 
tttttttttt tttttttttt ttttttggta 
agcaaggatt ttcttaactt cttcggcgac 
accacctaaa tcaccagttc tgatacctgc 
ggctttacct tcttcaggca agttcaatga 
agtggcgata gggttgacct tattctttgg 
gtacaaacca aatgcggtgt tcttgtctgg 
acccaaggag cctgggataa cggaggcttc 
ggtgattata ataccattta ggtgggttgg 
aatcaattga tgttgaactt tcaatgtagg 
ttttctccat aatcttgaag aggccaaaac 
tggtggctca tgttgtaggg ccatgaaagc 
aacggtgtat tgttcactat cccaagcgac 
aaagtaaata cctcccacta attctctaac 
tggcttgatt ggagataagt ctaaaagaga 
ggcgtacaat tgaagttctt tacggatttt 
ggtaccccat ttaggaccac ccacagcacc 
ttccagcgcc tcatctggaa gtggaacacc 
atgattttcg aaatcgaact tgacattgga 
aatggcttcg gctgtgattt cttgaccaac 
aggggcagac attacaatgg tatatccttg 




gagttgcatt ttttacacca taatgcatga 3840 
cactagtatg tttcaaaaac ctcaatctgt 3 900 
ttgcatagaa gagttagcta ctcaatgctt 3 960 
tactttcagg cgggtctgta gtaaggagaa 4 02 0 
acggactata gactatacta gtatactccg 4080 
cttgtccttt aacgaggcct taccactctt 4140 
gcagtgtgat ctaagattct atcttcgcga 42 00 
actagaaatg caaaaggcac ttctacaatg 4260 
gcattttttt tttttttttt tttttttttt 4320 
caaatatcat aaaaaaagag aatcttttta 4380 
agcatcaccg acttcggtgg tactgttgga 444 0 
atccaaaacc tttttaactg catcttcaat 4500 
caatttcaac atcattgcag cagacaagat 4560 
caaatctgga gcggaaccat ggcatggttc 4620 
caaagaggcc aaggacgcag atggcaacaa 4680 
atcggagatg atatcaccaa acatgttgct 474 0 
gttcttaact aggatcatgg cggcagaatc 4 800 
gaattcgttc ttgatggttt cctccacagt 4860 
attagcttta tccaaggacc aaataggcaa 4 92 0 
ggccattctt gtgattcttt gcacttctgg 4980 
accatcacca tcgtcttcct ttctcttacc 5040 
aacaacgaag tcagtacctt tagcaaattg 5100 
gtcggatgca aagttacatg gtcttaagtt 5160 
tagtaaacct tgttcaggtc taacactacc 5220 
taacaaaacg gcatcagcct tcttggaggc 5280 
tgtagcatcg atagcagcac caccaattaa 534 0 
acgaacatca gaaatagctt taagaacctt 5400 
gtggtcacct ggcaaaacga cgatcttctt 5460 
aaatatatat aaaaaaaaaa aaaaaaaaaa 5520 
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aaaaaaaaaa atgcagcttc tcaatgatat 
tatccgacaa actgttttac agatttacga 
acatccgaac ctgggagttt tccctgaaac 
tatagtctag cgctttacgg aagacaatgt 
atctattgca taggtaatct tgcacgtcgc 
tgcacttcaa tagcatatct ttgttaacga 
atgcaacgcg agagcgctaa tttttcaaac 
gaaatgcaac gcgaaagcgc tattttacca 
caaaaatgca acgcgagagc gctaattttt 
gaacagaaat gcaacgcgag agcgctattt 
ttctacaaaa atgcatcccg agagcgctat 
tttctccttt gtgcgctcta taatgcagtc 
taaggttaga agaaggctac tttggtgtct 
cacttcccgc gtttactgat tactagcgaa 
atccccgatt atattctata ccgatgtgga 
gcgttgatga ttcttcattg gtcagaaaat 
atactacgta taggaaatgt ttacattttc 
tcttactaca atttttttgt ctaaagagta 
gtcgagttta gatgcaagtt caaggagcga 
agcacagaga tatatagcaa agagatactt 
aatattttag tagctcgtta cagtccggtg 
gagcgctttt ggttttcaaa agcgctctga 
tcggaatagg aacttcaaag cgtttccgaa 
tgcgcacata cagctcactg ttcacgtcgc 
tatacatgag aagaacggca tagtgcgtgt 
atttatgtag gatgaaaggt agtctagtac 
gtatcgtatg cttccttcag cactaccctt 
tggattagtc tcatccttca atgctatcat 
ccgagaaact agtgcgaagt agtgatcagg 




tcgaatacgc tttgaggaga tacagcctaa 5580 
tcgtacttgt tacccatcat tgaattttga 5640 
agatagtata tttgaacctg tataataata 57 0 0 
atgtatttcg gttcctggag aaactattgc 5760 
atccccggtt cattttctgc gtttccatct 5820 
agcatctgtg cttcattttg tagaacaaaa 5880 
aaagaatctg agctgcattt ttacagaaca 594 0 
acgaagaatc tgtgcttcat ttttgtaaaa 6000 
caaacaaaga atctgagctg catttttaca 6060 
taccaacaaa gaatctatac ttcttttttg 6120 
ttttctaaca aagcatctta gattactttt 6180 
tcttgataac tttttgcact gtaggtccgt 624 0 
attttctctt ccataaaaaa agcctgactc 63 00 
gctgcgggtg cattttttca agataaaggc 6360 
ttgcgcatac tttgtgaaca gaaagtgata 642 0 
tatgaacggt ttcttctatt ttgtctctat 6480 
gtattgtttt cgattcactc tatgaatagt 6540 
atactagaga taaacataaa aaatgtagag 6600 
aaggtggatg ggtaggttat atagggatat 6660 
ttgagcaatg tttgtggaag cggtattcgc 6720 
cgtttttggt tttttgaaag tgcgtcttca 67 80 
agttcctata ctttctagag aataggaact 6840 
aacgagcgct tccgaaaatg caacgcgagc 6900 
acctatatct gcgtgttgcc tgtatatata 6960 
ttatgcttaa atgcgtactt atatgcgtct 7020 
ctcctgtgat attatcccat tccatgcggg 7080 
tagctgttct atatgctgcc actcctcaat 7140 
ttcctttgat attggatcat atgcatagta 7200 
tattgctgtt atctgatgag tatacgttgt 7260 
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cctggccacg gcagaagcac gcttatcgct 
taggcccttc attgaaagaa atgaggtcat 
attttttata gcaaagattg aataaggcgc 
gactaagtta tcttttaata attggtattc 
atttactcgt tttaggactg gttcagaatt 
atcgatgata agctgtcaaa catgagaatt 
tatttttata ggttaatgtc atgataataa 
ggggaaatgt gcgcggaacc cctatttgtt 
cgctcatgag acaataaccc tgataaatgc 
gtattcaaca tttccgtgtc gcccttattc 
ttgctcaccc agaaacgctg gtgaaagtaa 
tgggttacat cgaactggat ctcaacagcg 
aacgttttcc aatgatgagc acttttaaag 
ttgacgccgg gcaagagcaa ctcggtcgcc 
agtactcacc agtcacagaa aagcatctta 
gtgctgccat aaccatgagt gataacactg 
gaccgaagga gctaaccgct tttttgcaca 
gttgggaacc ggagctgaat gaagccatac 
cagcaatggc aacaacgttg cgcaaactat 
ggcaacaatt aatagactgg atggaggcgg 
cccttccggc tggctggttt attgctgata 
gtatcattgc agcactgggg ccagatggta 
cggggagtca ggcaactatg gatgaacgaa 
tgattaagca ttggtaactg tcagaccaag 
aacttcattt ttaatttaaa aggatctagg 
aaatccctta acgtgagttt tcgttccact 
gatcttcttg agatcctttt tttctgcgcg 
cgctaccagc ggtggtttgt ttgccggatc 
ctggcttcag cagagcgcag ataccaaata 




ccaatttccc acaacattag tcaactccgt 7320 
caaatgtctt ccaatgtgag attttgggcc 73 8 0 
atttttcttc aaagctttat tgtacgatct 7440 
ctgtttattg cttgaagaat tgccggtcct 7500 
cctcaaaaat tcatccaaat atacaagtgg 7560 
cttgaagacg aaagggcctc gtgatacgcc 762 0 
tggtttctta gacgtcaggt ggcacttttc 7680 
tatttttcta aatacattca aatatgtatc 7740 
ttcaataata ttgaaaaagg aagagtatga 780 0 
ccttttttgc ggcattttgc cttcctgttt 7860 
aagatgctga agatcagttg ggtgcacgag 792 0 
gtaagatcct tgagagtttt cgccccgaag 7980 
ttctgctatg tggcgcggta ttatcccgtg 8040 
gcatacacta ttctcagaat gacttggttg 810 0 
cggatggcat gacagtaaga gaattatgca 8160 
cggccaactt acttctgaca acgatcggag 8220 
acatggggga tcatgtaact cgccttgatc 82 80 
caaacgacga gcgtgacacc acgatgcctg 8340 
taactggcga actacttact ctagcttccc 8400 
ataaagttgc aggaccactt ctgcgctcgg 8460 
aatctggagc cggtgagcgt gggtctcgcg 852 0 
agccctcccg tatcgtagtt atctacacga 8580 
atagacagat cgctgagata ggtgcctcac 864 0 
tttactcata tatactttag attgatttaa 8700 
tgaagatcct ttttgataat ctcatgacca 8760 
gagcgtcaga ccccgtagaa aagatcaaag 8 82 0 
taatctgctg cttgcaaaca aaaaaaccac 8880 
aagagctacc aactcttttt ccgaaggtaa 8940 
ctgtccttct agtgtagccg tagttaggcc 9000 
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accacttcaa gaactctgta gcaccgccta catacctcgc tctgctaatc ctgttaccag 9060 
tggctgctgc cagtggcgat aagtcgtgtc ttaccgggtt ggactcaaga cgatagttac 912 0 
cggataaggc gcagcggtcg ggctgaacgg ggggttcgtg cacacagccc agcttggagc 918 0 
gaacgaccta caccgaactg agatacctac agcgtgagct atgagaaagc gccacgcttc 924 0 
ccgaagggag aaaggcggac aggtatccgg taagcggcag ggtcggaaca ggagagcgca 93 00 
cgagggagct tccaggggga aacgcctggt atctttatag tcctgtcggg tttcgccacc 9360 
tctgacttga gcgtcgattt ttgtgatgct cgtcaggggg gcggagccta tggaaaaacg 942 0 
ccagcaacgc ggccttttta cggttcctgg ccttttgctg gccttttgct cacatgttct 9480 
ttcctgcgtt atcccctgat tctgtggata accgtattac cgcctttgag tgagctgata 9540 
ccgctcgccg cagccgaacg accgagcgca gcgagtcagt gagcgaggaa gcggaagagc 9600 
gcctgatgcg gtattttctc cttacgcatc tgtgcggtat ttcacaccgc atatggtgca 9660 
ctctcagtac aatctgctct gatgccgcat agttaagcca gtatacactc cgctatcgct 972 0 
acgtgactgg gtcatggctg cgccccgaca cccgccaaca cccgctgacg cgccctgacg 9780 
ggcttgtctg ctcccggcat ccgcttacag acaagctgtg accgtctccg ggagctgcat 9840 
gtgtcagagg ttttcaccgt catcaccgaa acgcgcgagg cagctgcggt aaagctcatc 9900 
agcgtggtcg tgaagcgatt cacagatgtc tgcctgttca tccgcgtcca gctcgttgag 9960 
tttctccaga agcgttaatg tctggcttct gataaagcgg gccatgttaa gggcggtttt 1002 0 
ttcctgtttg gtcactgatg cctccgtgta agggggattt ctgttcatgg gggtaatgat 10080 
accgatgaaa cgagagagga tgctcacgat acgggttact gatgatgaac atgcccggtt 1014 0 
actggaacgt tgtgagggta aacaactggc ggtatggatg cggcgggacc agagaaaaat 10200 
cactcagggt caatgccagc gcttcgttaa tacagatgta ggtgttccac agggtagcca 10260 
gcagcatcct gcgatgcaga tccggaacat aatggtgcag ggcgctgact tccgcgtttc 1032 0 
cagactttac gaaacacgga aaccgaagac cattcatgtt gttgctcagg tcgcagacgt 10380 
tttgcagcag cagtcgcttc acgttcgctc gcgtatcggt gattcattct gctaaccagt 10440 
aaggcaaccc cgccagccta gccgggtcct caacgacagg agcacgatca tgcgcacccg 10500 
tggccaggac ccaacgctgc ccgagatgcg ccgcgtgcgg ctgctggaga tggcggacgc 10560 
gatggatatg ttctgccaag ggttggtttg cgcattcaca gttctccgca agaattgatt 10620 
ggctccaatt cttggagtgg tgaatccgtt agcgaggtgc cgccggcttc cattcaggtc 10680 
gaggtggccc ggctccatgc accgcgacgc aacgcgggga ggcagacaag gtatagggcg 10740 
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gcgcctacaa tccatgccaa cccgttccat gtgctcgccg aggcggcata aatcgccgtg 10800 
acgatcagcg gtccaatgat cgaagttagg ctggtaagag ccgcgagcga tccttgaagc 10860 
tgtccctgat ggtcgtcatc tacctgcctg gacagcatgg cctgcaacgc gggcatcccg 10920 
atgccgccgg aagcgagaag aatcataatg gggaaggcca tccagcctcg cgtcgcgaac 10980 
gccagcaaga cgtagcccag cgcgtcggcc gccatgccgg cgataatggc ctgcttctcg 11040 
ccgaaacgtt tggtggcggg accagtgacg aaggcttgag cgagggcgtg caagattccg 1110 0 
aataccgcaa gcgacaggcc gatcatcgtc gcgctccagc gaaagcggtc ctcgccgaaa 11160 
atgacccaga gcgctgccgg cacctgtcct acgagttgca tgataaagaa gacagtcata 11220 
agtgcggcga cgatagtcat gccccgcgcc caccggaagg agctgactgg gttgaaggct 112 80 
ctcaagggca tcggtcgagg atccttcaat atgcgcacat acgctgttat gttcaaggtc 1134 0 
ccttcgttta agaacgaaag cggtcttcct tttgagggat gtttcaagtt gttcaaatct 11400 
atcaaatttg caaatcccca gtctgtatct agagcgttga atcggtgatg cgatttgtta 11460 
attaaattga tggtgtcacc attaccaggt ctagatatac caatggcaaa ctgagcacaa 1152 0 
caataccagt ccggatcaac tggcaccatc tctcccgtag tctcatctaa tttttcttcc 11580 
ggatgaggtt ccagatatac cgcaacacct ttattatggt ttccctgagg gaataataga 11640 
atgtcccatt cgaaatcacc aattctaaac ctgggcgaat tgtatttcgg gtttgttaac 117 0 0 
tcgttccagt caggaatgtt ccacgtgaag ctatcttcca gcaaagtctc cacttcttca 11760 
tcaaattgtg gagaatactc ccaatgctct tatctatggg acttccggga aacacagtac 11820 
cgatacttcc caattcgtct tcagagctca ttgtttgttt gaagagacta atcaaagaat 11880 
cgttttctca aaaaaattaa tatcttaact gatagtttga tcaaaggggc aaaacgtagg 11940 
ggcaaacaaa cggaaaaatc gtttctcaaa ttttctgatg ccaagaactc taaccagtct 12000 
tatctaaaaa ttgccttatg atccgtctct ccggttacag cctgtgtaac tgattaatcc 12060 
tgcctttcta atcaccattc taatgtttta attaagggat tttgtcttca ttaacggctt 12120 
tcgctcataa aaatgttatg acgttttgcc cgcaggcggg aaaccatcca cttcacgaga 12180 
ctgatctcct ctgccggaac accgggcatc tccaacttat aagttggaga aataagagaa 1224 0 
tttcagattg agagaatgaa aaaaaaaaac ccttagttca taggtccatt ctcttagcgc 12 3 00 
aactacagag aacaggggca caaacaggca aaaaacgggc acaacctcaa tggagtgatg 123 60 
caacctgcct ggagtaaatg atgacacaag gcaattgacc cacgcatgta tctatctcat 12420 
tttcttacac cttctattac cttctgctct ctctgatttg gaaaaagctg aaaaaaaagg 124 80 
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ttgaaaccag ttccctgaaa ttattcccct acttgactaa taagtatata aagacggtag 12 54 0 

gtattgattg taattctgta aatctatttc ttaaacttct taaattctac ttttatagtt 12600 

agtctttttt ttagttttaa aacaccaaga acttagtttc gaataaacac acataaacaa 12660 

acaagcttac aaaacaaa atg get gca tat gca get cag ggc tat aag gtg 12711 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val 
15 10 

eta gta etc aac ccc tct gtt get gca aca ctg ggc ttt ggt get tac 12 75 9 
Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr 
15 20 25 

atg tec aag get cat ggg ate gat cct aac ate agg acc ggg gtg aga 12807 
Met Ser Lys Ala His Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg 
30 35 40 

aca att acc act ggc age ccc ate acg tac tec acc tac ggc aag ttc 12855 
Thr lie Thr Thr Gly Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe 
45 50 55 

ctt gec gac ggc ggg tgc teg ggg ggc get tat gac ata ata att tgt 12903 
Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp lie lie lie Cys 
60 65 70 75 

gac gag tgc cac tec acg gat gee aca tec ate ttg ggc att ggc act 12951 
Asp Glu Cys His Ser Thr Asp Ala Thr Ser lie Leu Gly lie Gly Thr 
80 85 90 

gtc ctt gac caa gca gag act gcg ggg gcg aga ctg gtt gtg etc gee 12999 
Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala 
95 100 105 

acc gee acc cct ccg ggc tec gtc act gtg ccc cat ccc aac ate gag 13047 
Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn lie Glu 
110 115 120 

gag gtt get ctg tec acc acc gga gag ate cct ttt tac ggc aag get 13 095 
Glu Val Ala Leu Ser Thr Thr Gly Glu lie Pro Phe Tyr Gly Lys Ala 
125 130 135 

ate ccc etc gaa gta ate aag ggg ggg aga cat etc ate ttc tgt cat 13143 
lie Pro Leu Glu Val lie Lys Gly Gly Arg His Leu lie Phe Cys His 
140 145 150 155 

tea aag aag aag tgc gac gaa etc gee gca aag ctg gtc gca ttg ggc 13191 
Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly 
160 165 170 

ate aat gee gtg gec tac tac cgc ggt ctt gac gtg tec gtc ate ccg 1323 9 
lie Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val lie Pro 
175 180 185 

acc age ggc gat gtt gtc gtc gtg gca acc gat gec etc atg acc ggc 132 87 
Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly 
190 195 200 
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tat acc ggc gac ttc gac teg gtg ata gac tgc aat acg tgt gtc acc 13335 
Tyr Thr Gly Asp Phe Asp Ser Val lie Asp Cys Asn Thr Cys Val Thr 
205 210 215 

cag aca gtc gat ttc age ctt gac cct acc ttc acc att gag aca ate 13 3 83 
Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr lie Glu Thr lie 
220 225 230 235 

acg etc ccc caa gat get gtc tec cgc act caa cgt egg ggc agg act 13431 
Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr 
240 245 250 

ggc agg ggg aag cca ggc ate tac aga ttt gtg gca ccg ggg gag cgc 13479 
Gly Arg Gly Lys Pro Gly lie Tyr Arg Phe Val Ala Pro Gly Glu Arg 
255 260 265 

ccc tec ggc atg ttc gac teg tec gtc etc tgt gag tgc tat gac gca 13527 
Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala 
270 275 280 

ggc tgt get tgg tat gag etc acg ccc gee gag act aca gtt agg eta 13575 
Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu 
285 290 295 

cga gcg tac atg aac acc ccg ggg ctt ccc gtg tgc cag gac cat ctt 1362 3 
Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu 
300 305 310 315 

gaa ttt tgg gag ggc gtc ttt aca ggc etc act cat ata gat gee cac 13671 
Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His lie Asp Ala His 
320 325 330 

ttt eta tec cag aca aag cag agt ggg gag aac ctt cct tac ctg gta 13719 
Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val 
335 340 345 

gcg tac caa gee acc gtg tgc get agg get caa gee cct ccc cca teg 13767 
Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser 
350 355 360 

tgg gac cag atg tgg aag tgt ttg att cgc etc aag ccc acc etc cat 13 815 
Trp Asp Gin Met Trp Lys Cys Leu lie Arg Leu Lys Pro Thr Leu His 
365 370 375 

ggg cca aca ccc ctg eta tac aga ctg ggc get gtt cag aat gaa ate 13 863 
Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu lie 
380 385 390 395 

acc ctg acg cac cca gtc acc aaa tac ate atg aca tgc atg teg gee 13 911 
Thr Leu Thr His Pro Val Thr Lys Tyr lie Met Thr Cys Met Ser Ala 
400 405 410 

gac ctg gag gtc gtc acg age acc tgg gtg etc gtt ggc ggc gtc ctg 13 959 
Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu 
415 420 425 

get get ttg gee gcg tat tgc ctg tea aca ggc tgc gtg gtc ata gtg 14 0 07 
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Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val lie Val 
430 435 440 

ggc agg gtc gtc ttg tec ggg aag ccg gca ate ata cct gac agg gaa 14055 
Gly Arg Val Val Leu Ser Gly Lys Pro Ala lie lie Pro Asp Arg Glu 
445 450 455 

gtc etc tac cga gag ttc gat gag atg gaa gag tgc tct cag cac tta 14103 
Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu 
460 465 470 475 

ccg tac ate gag caa ggg atg atg etc gee gag cag ttc aag cag aag 14151 
Pro Tyr He Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys 
480 485 490 

gee etc ggc etc ctg cag acc gcg tec cgt cag gca gag gtt ate gee 14199 
Ala Leu Gly Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala 
495 500 505 

cct get gtc cag acc aac tgg caa aaa etc gag acc ttc tgg gcg aag 14247 
Pro Ala Val Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys 
510 515 520 

cat atg tgg aac ttc ate agt ggg ata caa tac ttg gcg ggc ttg tea 142 95 
His Met Trp Asn Phe He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser 
525 530 535 

acg ctg cct ggt aac ccc gee att get tea ttg atg get ttt aca get 14343 
Thr Leu Pro Gly Asn Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala 
540 545 550 555 

get gtc acc age cca eta acc act age caa acc etc etc ttc aac ata 14391 
Ala Val Thr Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He 
560 565 570 

ttg ggg ggg tgg gtg get gee cag etc gee gee ccc ggt gee get act 1443 9 
Leu Gly Gly Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr 
575 580 585 

gee ttt gtg ggc get ggc tta get ggc gee gec ate ggc agt gtt gga 144 87 
Ala Phe Val Gly Ala Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly 
590 595 600 

ctg ggg aag gtc etc ata gac ate ctt gca ggg tat ggc gcg ggc gtg 14 535 
Leu Gly Lys Val Leu He Asp He Leu Ala Gly Tyr Gly Ala Gly Val 
605 610 615 

9 C 9 <J9 a 9 ct ctt Qtg gca ttc aag ate atg age ggt gag gtc ccc tec 14583 
Ala Gly Ala Leu Val Ala Phe Lys He Met Ser Gly Glu Val Pro Ser 
620 625 630 635 

acg gag gac ctg gtc aat eta ctg ccc gee ate etc teg ccc gga gee 14631 
Thr Glu Asp Leu Val Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala 
640 645 650 

etc gta gtc ggc gtg gtc tgt gca gca ata ctg cgc egg cac gtt ggc 14679 
Leu Val Val Gly Val Val Cys Ala Ala He Leu Arg Arg His Val Gly 
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655 660 665 

ccg ggc gag ggg gca gtg cag tgg atg aac egg ctg ata gec ttc gec 1472 7 
Pro Gly Glu Gly Ala Val Gin Trp Met Asn Arg Leu He Ala Phe Ala 
670 675 680 

tec egg ggg aac cat gtt tec ccc acg cac tac gtg ccg gag age gat 14775 
Ser Arg Gly Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp 
685 690 695 

gca get gee cgc gtc act gee ata etc age age etc act gta ace cag 14 823 
Ala Ala Ala Arg Val Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin 
700 705 710 715 

etc ctg agg cga ctg cac cag tgg ata age teg gag tgt acc act cca 14871 
Leu Leu Arg Arg Leu His Gin Trp He Ser Ser Glu Cys Thr Thr Pro 
720 725 730 

tgc tec ggt tec tgg eta agg gac ate tgg gac tgg ata tgc gag gtg 14 919 
Cys Ser Gly Ser Trp Leu Arg Asp He Trp Asp Trp He Cys Glu Val 
735 740 745 

ttg age gac ttt aag acc tgg eta aaa get aag etc atg cca cag ctg 14967 
Leu Ser Asp Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu 
750 755 760 

cct ggg ate ccc ttt gtg tec tgc cag cgc ggg tat aag ggg gtc tgg 15015 
Pro Gly He Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp 
765 770 775 

cga ggg gac ggc ate atg cac act cgc tgc cac tgt gga get gag ate 15063 
Arg Gly Asp Gly He Met His Thr Arg Cys His Cys Gly Ala Glu He 
780 785 790 795 

act gga cat gtc aaa aac ggg acg atg agg ate gtc ggt cct agg acc 15111 
Thr Gly His Val Lys Asn Gly Thr Met Arg He Val Gly Pro Arg Thr 
800 805 810 

tgc agg aac atg tgg agt ggg acc ttc ccc att aat gee tac acc acg 15159 
Cys Arg Asn Met Trp Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr 
815 820 825 

ggc ccc tgt acc ccc ctt cct gcg ccg aac tac acg ttc gcg eta tgg 15207 
Gly Pro Cys Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp 
830 835 840 

a 99 9 fc 9 tct 9 ca 9 a 9 9 aa tac 9^9 9 a 9 ata a 99 ca 9 9 fc 9 999 9 ac ttc 152 55 
Arg Val Ser Ala Glu Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe 
845 850 855 

cac tac gtg acg ggt atg act act gac aat ctt aaa tgc ccg tgc cag 153 03 
His Tyr Val Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin 
860 865 870 875 

gtc cca teg ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc eta cat 15351 
Val Pro Ser Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His 
880 885 890 
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agg ttt gcg ccc ccc tgc aag ccc ttg ctg egg gag gag gta tea ttc 153 99 
Arg Phe Ala Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe 
895 900 905 

aga gta gga etc cac gaa tac ccg gta ggg teg caa tta cct tgc gag 15447 
Arg Val Gly Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu 
910 915 920 

ccc gaa ccg gac gtg gec gtg ttg acg tec atg etc act gat ccc tec 15495 
Pro Glu Pro Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser 
925 930 935 

cat ata aca gca gag gcg gec ggg cga agg ttg gcg agg gga tea ccc 15543 
His lie Thr Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro 
940 945 950 955 

ccc tct gtg gec age tec teg get age cag eta tec get cca tct etc 15591 
Pro Ser Val Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu 
960 965 970 

aag gca act tgc ace get aac cat gac tec cct gat get gag etc ata 1563 9 
Lys Ala Thr Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie 
975 980 985 

gag gec aac etc eta tgg agg cag gag atg ggc ggc aac ate acc agg 15687 
Glu Ala Asn Leu Leu Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg 
990 995 1000 

gtt gag tea gaa aac aaa gtg gtg att ctg gac tec ttc gat ccg ctt 15735 
Val Glu Ser Glu Asn Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu 
1005 1010 1015 

gtg gcg gag gag gac gag egg gag ate tec gta ccc gca gaa ate ctg 15783 
Val Ala Glu Glu Asp Glu Arg Glu lie Ser Val Pro Ala Glu lie Leu 
1020 1025 1030 1035 

egg aag tct egg aga ttc gec cag gee ctg ccc gtt tgg gcg egg ccg 15831 
Arg Lys Ser Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro 
1040 1045 1050 

gac tat aac ccc ccg eta gtg gag acg tgg aaa aag ccc gac tac gaa 15879 
Asp Tyr Asn Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu 
1055 1060 1065 

cca cct gtg gtc cat ggc tgc ccg ctt cca cct cca aag tec cct cct 15927 
Pro Pro Val Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro 
1070 1075 1080 

gtg cct ccg cct egg aag aag egg acg gtg gtc etc act gaa tea acc 15975 
Val Pro Pro Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr 
1085 1090 1095 

eta tct act gee ttg gee gag etc gec acc aga age ttt ggc age tec 16023 
Leu Ser Thr Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser 
1100 1105 1110 1115 

tea act tec ggc att acg ggc gac aat acg aca aca tec tct gag ccc 16071 
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Ser Thr Ser Gly lie Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro 
1120 1125 1130 

gcc cct tct ggc tgc ccc ccc gac tec gac get gag tec tat tec tec 16119 
Ala Pro Ser Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser 
1135 1140 1145 

atg ccc ccc ctg gag ggg gag cct ggg gat ccg gat ctt age gac ggg 16167 
Met Pro Pro Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly 
1150 1155 1160 

tea tgg tea acg gtc agt agt gag gcc aac gcg gag gat gtc gtg tgc 16215 
Ser Trp Ser Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys 
1165 1170 1175 

tgc tea atg tct tac tct tgg aca ggc gca etc gtc ace ccg tgc gcc 16263 
Cys Ser Met Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala 
1180 1185 1190 1195 

gcg gaa gaa cag aaa ctg ccc ate aat gca eta age aac teg ttg eta 16311 
Ala Glu Glu Gin Lys Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu 

1200 1205 1210 

cgt cac cac aat ttg gtg tat tec ace ace tea cgc agt get tgc caa 16359 
Arg His His Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin 
1215 1220 1225 

agg cag aag aaa gtc aca ttt gac aga ctg caa gtt ctg gac age cat 16407 
Arg Gin Lys Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His 
1230 1235 1240 

tac cag gac gta etc aag gag gtt aaa gca gcg gcg tea aaa gtg aag 16455 
Tyr Gin Asp Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys 
1245 1250 1255 

get aac ttg eta tec gta gag gaa get tgc age ctg acg ccc cca cac 16503 
Ala Asn Leu Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His 
1260 1265 1270 1275 

tea gcc aaa tec aag ttt ggt tat ggg gca aaa gac gtc cgt tgc cat 16551 
Ser Ala Lys Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His 
1280 1285 1290 

gcc aga aag gcc gta ace cac ate aac tec gtg tgg aaa gac ctt ctg 16599 
Ala Arg Lys Ala Val Thr His lie Asn Ser Val Trp Lys Asp Leu Leu 
1295 1300 1305 

gaa gac aat gta aca cca ata gac act ace ate atg get aag aac gag 16647 
Glu Asp Asn Val Thr Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu 
1310 1315 1320 

gtt ttc tgc gtt cag cct gag aag ggg ggt cgt aag cca get cgt etc 16695 
Val Phe Cys Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu 
1325 1330 1335 

ate gtg ttc ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg get ttg 16743 
lie Val Phe Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu 
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1340 1345 1350 1355 

tac gac gtg gtt aca aag etc ccc ttg gec gtg atg gga age tec tac 16791 
Tyr Asp Val Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr 
1360 1365 1370 

gga ttc caa tac tea cca gga cag egg gtt gaa ttc etc gtg caa gcg 1683 9 
Gly Phe Gin Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala 
1375 1380 1385 

tgg aag tec aag aaa ace cca atg ggg ttc teg tat gat acc cgc tgc 16887 
Trp Lys Ser Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys 
1390 1395 1400 

ttt gac tec aca gtc act gag age gac ate cgt acg gag gag gca ate 1693 5 
Phe Asp Ser Thr Val Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie 
1405 1410 1415 

tac caa tgt tgt gac etc gac ccc caa gee cgc gtg gee ate aag tec 16983 
Tyr Gin Cys Cys Asp Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser 
1420 1425 1430 1435 

etc acc gag agg ctt tat gtt ggg ggc cct ctt acc aat tea agg ggg 17031 
Leu Thr Glu Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly 
1440 1445 1450 

gag aac tgc ggc tat cgc agg tgc cgc gcg age ggc gta ctg aca act 1707 9 
Glu Asn Cys Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr 
1455 1460 1465 

age tgt ggt aac acc etc act tgc tac ate aag gee egg gca gee tgt 17127 
Ser Cys Gly Asn Thr Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys 
1470 1475 1480 

cga gee gca ggg etc cag gac tgc acc atg etc gtg tgt ggc gac gac 17175 
Arg Ala Ala Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp 
1485 1490 1495 

tta gtc gtt ate tgt gaa age gcg ggg gtc cag gag gac gcg gcg age 17223 
Leu Val Val lie Cys Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser 
1500 1505 1510 1515 

ctg aga gee ttc acg gag get atg acc agg tac tec gee ccc cct ggg 17271 
Leu Arg Ala Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly 
1520 1525 1530 

gac ccc cca caa cca gaa tac gac ttg gag etc ata aca tea tgc tec 17319 
Asp Pro Pro Gin Pro Glu Tyr Asp Leu Glu Leu He Thr Ser Cys Ser 
1535 1540 1545 

tec aac gtg tea gtc gee cac gac ggc get gga aag agg gtc tac tac 17367 
Ser Asn Val Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr 
1550 1555 1560 

etc acc cgt gac cct aca acc ccc etc gcg aga get gcg tgg gag aca 17415 
Leu Thr Arg Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr 
1565 1570 1575 
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gca aga cac act cca gtc aat tec tgg eta ggc aac ata ate atg ttt 174 63 
Ala Arg His Thr Pro Val Asn Ser Trp Leu Gly Asn lie lie Met Phe 
1580 1585 1590 1595 

gec ccc aca ctg tgg gcg agg atg ata ctg atg acc cat ttc ttt age 17511 
Ala Pro Thr Leu Trp Ala Arg Met lie Leu Met Thr His Phe Phe Ser 
1600 1605 1610 

gtc ctt ata gec agg gac cag ctt gaa cag gec etc gat tgc gag ate 1755 9 
Val Leu lie Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie 
1615 1620 1625 

tac ggg gec tgc tac tec ata gaa cca ctg gat eta cct cca ate att 17607 
Tyr Gly Ala Cys Tyr Ser lie Glu Pro Leu Asp Leu Pro Pro lie lie 
1630 1635 1640 

caa aga etc cat ggc etc age gca ttt tea etc cac agt tac tct cca 17655 
Gin Arg Leu His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro 
1645 1650 1655 

ggt gaa ate aat agg gtg gec gca tgc etc aga aaa ctt ggg gta ccg 17703 
Gly Glu lie Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro 
1660 1665 1670 1675 

ccc ttg cga get tgg aga cac egg gee egg age gtc cgc get agg ctt 17751 
Pro Leu Arg Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu 
1680 1685 1690 

ctg gee aga gga ggc agg get gec ata tgt ggc aag tac etc ttc aac 17799 
Leu Ala Arg Gly Gly Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn 
1695 1700 1705 

tgg gca gta aga aca aag etc aaa etc act cca ata gcg gee get ggc 17847 
Trp Ala Val Arg Thr Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly 
1710 1715 1720 

cag ctg gac ttg tec ggc tgg ttc acg get ggc tac age ggg gga gac 17895 
Gin Leu Asp Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp 
1725 1730 1735 

att tat cac age gtg tct cat gee egg ccc cgc tgg ate tgg ttt tgc 17943 
lie Tyr His Ser Val Ser His Ala Arg Pro Arg Trp lie Trp Phe Cys 
1740 1745 1750 1755 

eta etc ctg ctt get gca ggg gta ggc ate tac etc etc ccc aac cga 17991 
Leu Leu Leu Leu Ala Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg 
1760 1765 1770 

tgaatagtcg actttgttcc cactgtactt ttagctcgta caaaatacaa tatacttttc 18051 

atttctccgt aaacaacatg ttttcccatg taatatcctt ttctattttt cgttccgtta 18111 

ccaactttac acatacttta tatagctatt cacttctata cactaaaaaa ctaagacaat 18171 

tttaattttg ctgcctgcca tatttcaatt tgttataaat tcctataatt tatcctatta 18231 

gtagctaaaa aaagatgaat gtgaatcgaa tcctaagaga attggatctg atccacagga 182 91 
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cgggtgtggt cgccatgatc gcgtagtcga tagtggctcc aagtagcgaa gcgagcagga 18351 
ctgggcggcg gccaaagcgg tcggacagtg ctccgagaac gggtgcgcat agaaattgca 18411 
tcaacgcata tagcgctagc agcacgccat agtgactggc gatgctgtcg gaatggacga 18471 
tatcccgcaa gaggcccggc agtaccggca taaccaagcc tatgcctaca gcatccaggg 18531 
tgacggtgcc gaggatgacg atgagcgcat tgttagattt catacacggt gcctgactgc 18591 
gttagcaatt taactgtgat aaactaccgc attaaagctt tttctttcca attttttttt 18651 
tttcgtcatt ataaaaatca ttacgaccga gattcccggg taataactga tataattaaa 18711 
ttgaagctct aatttgtgag tttagtatac atgcatttac ttataataca gttttttagt 18771 
tttgctggcc gcatcttctc aaatatgctt cccagcctgc ttttctgtaa cgttcaccct 18831 
ctaccttagc atcccttccc tttgcaaata gtcctcttcc aacaataata atgtcagatc 18891 
ctgtagagac cacatcatcc acggttctat actgttgacc caatgcgtct cccttgtcat 18951 
ctaaacccac accgggtgtc ataatcaacc aatcgtaacc ttcatctctt ccacccatgt 19011 
ctctttgagc aataaagccg ataacaaaat ctttgtcgct cttcgcaatg tcaacagtac 19071 
ccttagtata ttctccagta gatagggagc ccttgcatga caattctgct aacatcaaaa 19131 
ggcctctagg ttcctttgtt acttcttctg ccgcctgctt caaaccgcta acaatacctg 19191 
ggcccaccac accgtgtgca ttcgtaatgt ctgcccattc tgctattctg tatacacccg 19251 
cagagtactg caatttgact gtattaccaa tgtcagcaaa ttttctgtct tcgaagagta 19311 
aaaaattgta cttggcggat aatgccttta gcggcttaac tgtgccctcc atggaaaaat 19371 
cagtcaagat atccacatgt gtttttagta aacaaatttt gggacctaat gcttcaacta 19431 
actccagtaa ttccttggtg gtacgaacat ccaatgaagc acacaagttt gtttgctttt 19491 
cgtgcatgat attaaatagc ttggcagcaa caggactagg atgagtagca gcacgttcct 19551 
tatatgtagc tttcgacatg atttatcttc gtttcctgca ggtttttgtt ctgtgcagtt 19611 
gggttaagaa tactgggcaa tttcatgttt cttcaacact acatatgcgt atatatacca 19671 
atctaagtct gtgctccttc cttcgttctt ccttctgttc ggagattacc gaatcaaaaa 19731 
aatttcaagg aaaccgaaat caaaaaaaag aataaaaaaa aaatgatgaa ttgaaaagct 19791 
tatcgat 19798 

<210> 11 
<211> 1771 
<212> PRT 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: 
pd.deltaNS3NS5.pj 

<400> 11 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
15 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg Thr lie Thr Thr Gly 
35 40 45 

Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp lie lie lie Cys Asp Glu Cys His Ser 
65 - 70 75 80 

Thr Asp Ala Thr Ser lie Leu Gly lie Gly Thr Val Leu Asp Gin Ala 
85 90 95 

Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn He Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu Glu Val 
130 135 140 

He Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 
145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly He Asn Ala Val Ala 
165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr He Glu Thr He Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 
245 250 255 

Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 
260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
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275 



280 



285 



Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 

Val Phe Thr Gly Leu Thr His lie Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 



Lys Cys Leu lie Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu lie Thr Leu Thr His Pro 
385 390 395 400 

Val Thr Lys Tyr lie Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val lie Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala lie lie Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr lie Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 525 

He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He Leu Gly Gly Trp Val 
565 570 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 



355 



360 



365 



85 



580 



585 



590 



Gly Leu Ala Gly Ala Ala lie Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

lie Asp lie Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys lie Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala lie Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala lie Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu lie Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala lie Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 

His Gin Trp lie Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp lie Trp Asp Trp lie Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly lie Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly lie 
770 775 780 

Met His Thr Arg Cys His Cys Gly Ala Glu lie Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg lie Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro lie Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 

Glu Tyr Val Glu lie Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 

Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
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885 



890 



895 



Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His lie Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu lie Ser Val Pro Ala Glu lie Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly lie 
105 1110 1115 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
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185 1190 1195 1200 

Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 

Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu lie Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val lie Cys 
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1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu He Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn He He Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 

Ala Arg Met He Leu Met Thr His Phe Phe Ser Val Leu He Ala Arg 
1605 1610 1615 

Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser He Glu Pro Leu Asp Leu Pro Pro He He Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu He Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 1680 

Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 

Arg Ala Ala He Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp He Tyr His Ser Val 
1730 1735 1740 

Ser His Ala Arg Pro Arg Trp He Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 

Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg 
1765 1770 



<210> 12 
<211> 20220 
<212> DNA 
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<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd. delta. NS3NS5 .pj .corel21 

<220> 
<221> CDS 

<222> (12679) . . (18354) 
<400> 12 

atcgatccta ccccttgcgc taaagaagta tatgtgccta ctaacgcttg tctttgtctc 60 
tgtcactaaa cactggatta ttactcccag atacttattt tggactaatt taaatgattt 12 0 
cggatcaacg ttcttaatat cgctgaatct tccacaattg atgaaagtag ctaggaagag 180 
gaattggtat aaagtttttg tttttgtaaa tctcgaagta tactcaaacg aatttagtat 240 
tttctcagtg atctcccaga tgctttcacc ctcacttaga agtgctttaa gcattttttt 300 
actgtggcta tttcccttat ctgcttcttc cgatgattcg aactgtaatt gcaaactact 360 
tacaatatca gtgatatcag attgatgttt ttgtccatag taaggaataa ttgtaaattc 42 0 
ccaagcagga atcaatttct ttaatgaggc ttccagaatt gttgcttttt gcgtcttgta 480 
tttaaactgg agtgatttat tgacaatatc gaaactcagc gaattgctta tgatagtatt 540 
atagctcatg aatgtggctc tcttgattgc tgttccgtta tgtgtaatca tccaacataa 6 00 
ataggttagt tcagcagcac ataatgctat tttctcacct gaaggtcttt caaacctttc 660 
cacaaactga cgaacaagca ccttaggtgg tgttttacat aatatatcaa attgtggcat 720 
gcttagcgcc gatcttgtgt gcaattgata tctagtttca actactctat ttatcttgta 780 
tcttgcagta ttcaaacacg ctaactcgaa aaactaactt taattgtcct gtttgtctcg 840 
cgttctttcg aaaaatgcac cggccgcgca ttatttgtac tgcgaaaata attggtactg 900 
cggtatcttc atttcatatt ttaaaaatgc acctttgctg cttttcctta atttttagac 960 
ggcccgcagg ttcgttttgc ggtactatct tgtgataaaa agttgttttg acatgtgatc 102 0 
tgcacagatt ttataatgta ataagcaaga atacattatc aaacgaacaa tactggtaaa 10 80 
agaaaaccaa aatggacgac attgaaacag ccaagaatct gacggtaaaa gcacgtacag 1140 
cttatagcgt ctgggatgta tgtcggctgt ttattgaaat gattgctcct gatgtagata 12 00 
ttgatataga gagtaaacgt aagtctgatg agctactctt tccaggatat gtcataaggc 1260 
ccatggaatc tctcacaacc ggtaggccgt atggtcttga ttctagcgca gaagattcca 132 0 
gcgtatcttc tgactccagt gctgaggtaa ttttgcctgc tgcgaagatg gttaaggaaa 13 80 
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ggtttgattc gattggaaat ggtatgctct 
atttgatgct acagaataac aagctgttag 
ctataataat aggaagattg cccgagaaag 
gaaaaatgga ttgtacacag ttattagtcc 
agctcgtaag cgtcgttacc caattgctta 
taataggtga tttattcatc ccggaatctc 
tggcggcaga gaatcgttta cagcaaaaaa 
accatgctaa tacaaatgaa gaagttccct 
caagaggagc atataaatta caaaacacca 
aaaaaaggag agtagcaacg agggtaaggg 
gatccaatat caaaggaaat gatagcattg 
cagcatatag aacagctaaa gggtagtgct 
gggataatat cacaggaggt actagactac 
gtacgcattt aagcataaac acgcactatg 
caacacgcag atataggtgc gacgtgaaca 
ttttcggaag cgctcgtttt cggaaacgct 
ctagaaagta taggaacttc agagcgcttt 
ttcaaaaaac caaaaacgca ccggactgta 
tccacaaaca ttgctcaaaa gtatctcttt 
aacctaccca tccacctttc gctccttgaa 
aggcttccaa tgctcttcaa attttactgt 
ctcttcataa tgtaagctta tctttatcga 
ctttacggtt ccctgagatt gaattagttc 
ctttgtacga cgaattttga ggttcgccat 
tattatctcc gcctcagttt gatcttccgc 
tatttcaccc cacaatcctt catccgcctc 
atgttgtaca ttgtttagtt cacgagaagg 
tatatgacct ttatcctgtt ctctttccac 
gcacctaata acattcttca aggcggagaa 




cttcacaaga agcaagtcag gctgccatag 144 0 
acaatagaaa gcaactatac aaatctattg 15 00 
acaagaagag agctaccgaa atgctcatga 15 60 
caccagctcc aacggaagaa gatgttatga 162 0 
ctttagttcc accagatcgt caagctgctt 168 0 
taaaggatat attcaatagt ttcaatgaac 1740 
agagtgagtt ggaaggaagg actgaagtga 1800 
ccaggcgaac aagaagtaga gacacaaatg 1860 
tcactgaggg ccctaaagcg gttcccacga 192 0 
gcagaaaatc acgtaatact tctagggtat 1980 
aaggatgaga ctaatccaat tgaggagtgg 2 040 
gaaggaagca tacgataccc cgcatggaat 2100 
ctttcatcct acataaatag acgcatataa 2160 
ccgttcttct catgtatata tatatacagg 2220 
gtgagctgta tgtgcgcagc tcgcgttgca 22 80 
ttgaagttcc tattccgaag ttcctattct 2340 
tgaaaaccaa aagcgctctg aagacgcact 24 00 
acgagctact aaaatattgc gaataccgct 2460 
gctatatatc tctgtgctat atccctatat 2520 
cttgcatcta aactcgacct ctacatcaac 2580 
caagtagacc catacggctg taatatgctg 2 640 
atcgtgtgaa aaactactac cgcgataaac 27 00 
ctttagtata tgatacaaga cacttttgaa 2760 
cctctggcta tttccaatta tcctgtcggc 2820 
ttcagactgc catttttcac ataatgaatc 2 8 80 
cgcatcttgt tccgttaaac tattgacttc 2940 
gtcctcttca ggcggtagct cctgatctcc 3000 
aaacttagaa atgtattcat gaattatgga 3060 
gtttgggcca gatgcccaat atgcttgaca 3120 
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tgaaaacgtg agaatgaatt tagtattatt 
tcgaagataa gagaagaatg cagtgacctt 
aaaaaatacg cctttaggcc ttctgatacc 
attaatatct aaaccctctc cgatggtggc 
aaactgtgat aattctgggt gatttatgat 
aggatcaggc caatccagtt ctttttcaat 
tccaacaaat gcaaatgcta acgttttgta 
cccccttgtc gtctcgatta cacacctact 
cataatacat tgcttaatac aagcaagcag 
cattacagct gatgtcattg tatatcagcg 
tcgcggtttt tataaacaaa actttcgtta 
ttggaaattc gggaaaaagt agagcaacgc 
ttaacttcga gaagggatta aggctaattt 
ccattgaatg ccttataaaa cagctataga 
tttgtcaaag cttactgatg atgatgtgtc 
tgacattata aagctggcac ttagaattcc 
tctactgtac gatacacttc cgctcaggtc 
ttgttactct attgatccag ctcagcaaag 
tgtagtaaaa ctagctagac cgagaaagag 
gctgccatca ttattatccg atgtgacgct 
tttttttttt tttttttttt ttttttggta 
agcaaggatt ttcttaactt cttcggcgac 
accacctaaa tcaccagttc tgatacctgc 
ggctttacct tcttcaggca agttcaatga 
agtggcgata gggttgacct tattctttgg 
gtacaaacca aatgcggtgt tcttgtctgg 
acccaaggag cctgggataa cggaggcttc 
ggtgattata ataccattta ggtgggttgg 
aatcaattga tgttgaactt tcaatgtagg 




gtgatattct gaggcaattt tattataatc 3180 
tgtattgaca aatggagatt ccatgtatct 3240 
ctttcccctg cggtttagcg tgccttttac 3300 
ctttaactga ctaataaatg caaccgatat 3360 
tcgatcgaca attgtattgt acactagtgc 342 0 
taccggtgtg tcgtctgtat tcagtacatg 34 80 
tttcttataa ttgtcaggaa ctggaaaagt 3540 
ttcatcgtac accataggtt ggaagtgctg 3 60 0 
tctctcgcca ttcatatttc agttattttc 3660 
ctgtaaaaat ctatctgtta cagaaggttt 372 0 
cgaaatcgag caatcacccc agctgcgtat 3780 
gagttgcatt ttttacacca taatgcatga 3 84 0 
cactagtatg tttcaaaaac ctcaatctgt 3900 
ttgcatagaa gagttagcta ctcaatgctt 3 960 
tactttcagg cgggtctgta gtaaggagaa 402 0 
acggactata gactatacta gtatactccg 4080 
cttgtccttt aacgaggcct taccactctt 4140 
gcagtgtgat ctaagattct atcttcgcga 42 00 
actagaaatg caaaaggcac ttctacaatg 42 60 
gcattttttt tttttttttt tttttttttt 4320 
caaatatcat aaaaaaagag aatcttttta 43 80 
agcatcaccg acttcggtgg tactgttgga 4440 
atccaaaacc tttttaactg catcttcaat 4500 
caatttcaac atcattgcag cagacaagat 4560 
caaatctgga gcggaaccat ggcatggttc 4 62 0 
caaagaggcc aaggacgcag atggcaacaa 4680 
atcggagatg atatcaccaa acatgttgct 4740 
gttcttaact aggatcatgg cggcagaatc 4 800 
gaattcgttc ttgatggttt cctccacagt 4860 
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ttttctccat aatcttgaag aggccaaaac 
tggtggctca tgttgtaggg ccatgaaagc 
aacggtgtat tgttcactat cccaagcgac 
aaagtaaata cctcccacta attctctaac 
tggcttgatt ggagataagt ctaaaagaga 
ggcgtacaat tgaagttctt tacggatttt 
ggtaccccat ttaggaccac ccacagcacc 
ttccagcgcc tcatctggaa gtggaacacc 
atgattttcg aaatcgaact tgacattgga 
aatggcttcg gctgtgattt cttgaccaac 
aggggcagac attacaatgg tatatccttg 
aaaaaaaaaa atgcagcttc tcaatgatat 
tatccgacaa actgttttac agatttacga 
acatccgaac ctgggagttt tccctgaaac 
tatagtctag cgctttacgg aagacaatgt 
atctattgca taggtaatct tgcacgtcgc 
tgcacttcaa tagcatatct ttgttaacga 
atgcaacgcg agagcgctaa tttttcaaac 
gaaatgcaac gcgaaagcgc tattttacca 
caaaaatgca acgcgagagc gctaattttt 
gaacagaaat gcaacgcgag agcgctattt 
ttctacaaaa atgcatcccg agagcgctat 
tttctccttt gtgcgctcta taatgcagtc 
taaggttaga agaaggctac tttggtgtct 
cacttcccgc gtttactgat tactagcgaa 
atccccgatt atattctata ccgatgtgga 
gcgttgatga ttcttcattg gtcagaaaat 
atactacgta taggaaatgt ttacattttc 
tcttactaca atttttttgt ctaaagagta 




attagcttta tccaaggacc aaataggcaa 4 92 0 
ggccattctt gtgattcttt gcacttctgg 4980 
accatcacca tcgtcttcct ttctcttacc 5040 
aacaacgaag tcagtacctt tagcaaattg 5100 
gtcggatgca aagttacatg gtcttaagtt 5160 
tagtaaacct tgttcaggtc taacactacc 522 0 
taacaaaacg gcatcagcct tcttggaggc 5280 
tgtagcatcg atagcagcac caccaattaa 534 0 
acgaacatca gaaatagctt taagaacctt 54 00 
gtggtcacct ggcaaaacga cgatcttctt 5460 
aaatatatat aaaaaaaaaa aaaaaaaaaa 552 0 
tcgaatacgc tttgaggaga tacagcctaa 5580 
tcgtacttgt tacccatcat tgaattttga 5640 
agatagtata tttgaacctg tataataata 5700 
atgtatttcg gttcctggag aaactattgc 5760 
atccccggtt cattttctgc gtttccatct 5820 
agcatctgtg cttcattttg tagaacaaaa 5880 
aaagaatctg agctgcattt ttacagaaca 5940 
acgaagaatc tgtgcttcat ttttgtaaaa 600 0 
caaacaaaga atctgagctg catttttaca 6060 
taccaacaaa gaatctatac ttcttttttg 6120 
ttttctaaca aagcatctta gattactttt 6180 
tcttgataac tttttgcact gtaggtccgt 6240 
attttctctt ccataaaaaa agcctgactc 6300 
gctgcgggtg cattttttca agataaaggc 63 60 
ttgcgcatac tttgtgaaca gaaagtgata 642 0 
tatgaacggt ttcttctatt ttgtctctat 6480 
gtattgtttt cgattcactc tatgaatagt 654 0 
atactagaga taaacataaa aaatgtagag 660 0 
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gtcgagttta gatgcaagtt caaggagcga 
agcacagaga tatatagcaa agagatactt 
aatattttag tagctcgtta cagtccggtg 
gagcgctttt ggttttcaaa agcgctctga 
tcggaatagg aacttcaaag cgtttccgaa 
tgcgcacata cagctcactg ttcacgtcgc 
tatacatgag aagaacggca tagtgcgtgt 
atttatgtag gatgaaaggt agtctagtac 
gtatcgtatg cttccttcag cactaccctt 
tggattagtc tcatccttca atgctatcat 
ccgagaaact agtgcgaagt agtgatcagg 
cctggccacg gcagaagcac gcttatcgct 
taggcccttc attgaaagaa atgaggtcat 
attttttata gcaaagattg aataaggcgc 
gactaagtta tcttttaata attggtattc 
atttactcgt tttaggactg gttcagaatt 
atcgatgata agctgtcaaa catgagaatt 
tatttttata ggttaatgtc atgataataa 
ggggaaatgt gcgcggaacc cctatttgtt 
cgctcatgag acaataaccc tgataaatgc 
gtattcaaca tttccgtgtc gcccttattc 
ttgctcaccc agaaacgctg gtgaaagtaa 
tgggttacat cgaactggat ctcaacagcg 
aacgttttcc aatgatgagc acttttaaag 
ttgacgccgg gcaagagcaa ctcggtcgcc 
agtactcacc agtcacagaa aagcatctta 
gtgctgccat aaccatgagt gataacactg 
gaccgaagga gctaaccgct tttttgcaca 
gttgggaacc ggagctgaat gaagccatac 




aaggtggatg ggtaggttat atagggatat 6660 
ttgagcaatg tttgtggaag cggtattcgc 672 0 
cgtttttggt tttttgaaag tgcgtcttca 6780 
agttcctata ctttctagag aataggaact 684 0 
aacgagcgct tccgaaaatg caacgcgagc 6900 
acctatatct gcgtgttgcc tgtatatata 6960 
ttatgcttaa atgcgtactt atatgcgtct 7020 
ctcctgtgat attatcccat tccatgcggg 7080 
tagctgttct atatgctgcc actcctcaat 7140 
ttcctttgat attggatcat atgcatagta 7200 
tattgctgtt atctgatgag tatacgttgt 7260 
ccaatttccc acaacattag tcaactccgt 7320 
caaatgtctt ccaatgtgag attttgggcc 73 8 0 
atttttcttc aaagctttat tgtacgatct 744 0 
ctgtttattg cttgaagaat tgccggtcct 7500 
cctcaaaaat tcatccaaat atacaagtgg 7560 
cttgaagacg aaagggcctc gtgatacgcc 7620 
tggtttctta gacgtcaggt ggcacttttc 7680 
tatttttcta aatacattca aatatgtatc 7740 
ttcaataata ttgaaaaagg aagagtatga 780 0 
ccttttttgc ggcattttgc cttcctgttt 7860 
aagatgctga agatcagttg ggtgcacgag 792 0 
gtaagatcct tgagagtttt cgccccgaag 7 980 
ttctgctatg tggcgcggta ttatcccgtg 8040 
gcatacacta ttctcagaat gacttggttg 8100 
cggatggcat gacagtaaga gaattatgca 8160 
cggccaactt acttctgaca acgatcggag 822 0 
acatggggga tcatgtaact cgccttgatc 82 8 0 
caaacgacga gcgtgacacc acgatgcctg 834 0 
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cagcaatggc aacaacgttg cgcaaactat taactggcga actacttact ctagcttccc 8400 
ggcaacaatt aatagactgg atggaggcgg ataaagttgc aggaccactt ctgcgctcgg 8460 
cccttccggc tggctggttt attgctgata aatctggagc cggtgagcgt gggtctcgcg 852 0 
gtatcattgc agcactgggg ccagatggta agccctcccg tatcgtagtt atctacacga 8580 
cggggagtca ggcaactatg gatgaacgaa atagacagat cgctgagata ggtgcctcac 8640 
tgattaagca ttggtaactg tcagaccaag tttactcata tatactttag attgatttaa 8700 
aacttcattt ttaatttaaa aggatctagg tgaagatcct ttttgataat ctcatgacca 8760 
aaatccctta acgtgagttt tcgttccact gagcgtcaga ccccgtagaa aagatcaaag 8 82 0 
gatcttcttg agatcctttt tttctgcgcg taatctgctg cttgcaaaca aaaaaaccac 8880 
cgctaccagc ggtggtttgt ttgccggatc aagagctacc aactcttttt ccgaaggtaa 894 0 
ctggcttcag cagagcgcag ataccaaata ctgtccttct agtgtagccg tagttaggcc 9000 
accacttcaa gaactctgta gcaccgccta catacctcgc tctgctaatc ctgttaccag 9060 
tggctgctgc cagtggcgat aagtcgtgtc ttaccgggtt ggactcaaga cgatagttac 912 0 
cggataaggc gcagcggtcg ggctgaacgg ggggttcgtg cacacagccc agcttggagc 9180 
gaacgaccta caccgaactg agatacctac agcgtgagct atgagaaagc gccacgcttc 9240 
ccgaagggag aaaggcggac aggtatccgg taagcggcag ggtcggaaca ggagagcgca 93 0 0 
cgagggagct tccaggggga aacgcctggt atctttatag tcctgtcggg tttcgccacc 9360 
tctgacttga gcgtcgattt ttgtgatgct cgtcaggggg gcggagccta tggaaaaacg 942 0 
ccagcaacgc ggccttttta cggttcctgg ccttttgctg gccttttgct cacatgttct 9480 
ttcctgcgtt atcccctgat tctgtggata accgtattac cgcctttgag tgagctgata 9540 
ccgctcgccg cagccgaacg accgagcgca gcgagtcagt gagcgaggaa gcggaagagc 9600 
gcctgatgcg gtattttctc cttacgcatc tgtgcggtat ttcacaccgc atatggtgca 9660 
ctctcagtac aatctgctct gatgccgcat agttaagcca gtatacactc cgctatcgct 9720 
acgtgactgg gtcatggctg cgccccgaca cccgccaaca cccgctgacg cgccctgacg 97 80 
ggcttgtctg ctcccggcat ccgcttacag acaagctgtg accgtctccg ggagctgcat 9840 
gtgtcagagg ttttcaccgt catcaccgaa acgcgcgagg cagctgcggt aaagctcatc 990 0 
agcgtggtcg tgaagcgatt cacagatgtc tgcctgttca tccgcgtcca gctcgttgag 9960 
tttctccaga agcgttaatg tctggcttct gataaagcgg gccatgttaa gggcggtttt 10020 
ttcctgtttg gtcactgatg cctccgtgta agggggattt ctgttcatgg gggtaatgat 10080 
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accgatgaaa cgagagagga tgctcacgat acgggttact gatgatgaac atgcccggtt 1014 0 
actggaacgt tgtgagggta aacaactggc ggtatggatg cggcgggacc agagaaaaat 102 00 
cactcagggt caatgccagc gcttcgttaa tacagatgta ggtgttccac agggtagcca 10260 
gcagcatcct gcgatgcaga tccggaacat aatggtgcag ggcgctgact tccgcgtttc 1032 0 
cagactttac gaaacacgga aaccgaagac cattcatgtt gttgctcagg tcgcagacgt 10380 
tttgcagcag cagtcgcttc acgttcgctc gcgtatcggt gattcattct gctaaccagt 1044 0 
aaggcaaccc cgccagccta gccgggtcct caacgacagg agcacgatca tgcgcacccg 10500 
tggccaggac ccaacgctgc ccgagatgcg ccgcgtgcgg ctgctggaga tggcggacgc 10560 
gatggatatg ttctgccaag ggttggtttg cgcattcaca gttctccgca agaattgatt 10620 
ggctccaatt cttggagtgg tgaatccgtt agcgaggtgc cgccggcttc cattcaggtc 10680 
gaggtggccc ggctccatgc accgcgacgc aacgcgggga ggcagacaag gtatagggcg 10740 
gcgcctacaa tccatgccaa cccgttccat gtgctcgccg aggcggcata aatcgccgtg 10800 
acgatcagcg gtccaatgat cgaagttagg ctggtaagag ccgcgagcga tccttgaagc 10860 
tgtccctgat ggtcgtcatc tacctgcctg gacagcatgg cctgcaacgc gggcatcccg 1092 0 
atgccgccgg aagcgagaag aatcataatg gggaaggcca tccagcctcg cgtcgcgaac 1098 0 
gccagcaaga cgtagcccag cgcgtcggcc gccatgccgg cgataatggc ctgcttctcg 1104 0 
ccgaaacgtt tggtggcggg accagtgacg aaggcttgag cgagggcgtg caagattccg 11100 
aataccgcaa gcgacaggcc gatcatcgtc gcgctccagc gaaagcggtc ctcgccgaaa 11160 
atgacccaga gcgctgccgg cacctgtcct acgagttgca tgataaagaa gacagtcata 1122 0 
agtgcggcga cgatagtcat gccccgcgcc caccggaagg agctgactgg gttgaaggct 112 80 
ctcaagggca tcggtcgagg atccttcaat atgcgcacat acgctgttat gttcaaggtc 1134 0 
ccttcgttta agaacgaaag cggtcttcct tttgagggat gtttcaagtt gttcaaatct 114 0 0 
atcaaatttg caaatcccca gtctgtatct agagcgttga atcggtgatg cgatttgtta 11460 
attaaattga tggtgtcacc attaccaggt ctagatatac caatggcaaa ctgagcacaa 1152 0 
caataccagt ccggatcaac tggcaccatc tctcccgtag tctcatctaa tttttcttcc 11580 
ggatgaggtt ccagatatac cgcaacacct ttattatggt ttccctgagg gaataataga 11640 
atgtcccatt cgaaatcacc aattctaaac ctgggcgaat tgtatttcgg gtttgttaac 11700 
tcgttccagt caggaatgtt ccacgtgaag ctatcttcca gcaaagtctc cacttcttca 11760 
tcaaattgtg gagaatactc ccaatgctct tatctatggg acttccggga aacacagtac 11820 
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cgatacttcc caattcgtct tcagagctca ttgtttgttt gaagagacta atcaaagaat 1188 0 

cgttttctca aaaaaattaa tatcttaact gatagtttga tcaaaggggc aaaacgtagg 11940 

ggcaaacaaa cggaaaaatc gtttctcaaa ttttctgatg ccaagaactc taaccagtct 12000 

tatctaaaaa ttgccttatg atccgtctct ccggttacag cctgtgtaac tgattaatcc 12 060 

tgcctttcta atcaccattc taatgtttta attaagggat tttgtcttca ttaacggctt 1212 0 

tcgctcataa aaatgttatg acgttttgcc cgcaggcggg aaaccatcca cttcacgaga 1218 0 

ctgatctcct ctgccggaac accgggcatc tccaacttat aagttggaga aataagagaa 12240 

tttcagattg agagaatgaa aaaaaaaaac ccttagttca taggtccatt ctcttagcgc 12300 

aactacagag aacaggggca caaacaggca aaaaacgggc acaacctcaa tggagtgatg 123 60 

caacctgcct ggagtaaatg atgacacaag gcaattgacc cacgcatgta tctatctcat 1242 0 

tttcttacac cttctattac cttctgctct ctctgatttg gaaaaagctg aaaaaaaagg 124 8 0 

ttgaaaccag ttccctgaaa ttattcccct acttgactaa taagtatata aagacggtag 12540 

gtattgattg taattctgta aatctatttc ttaaacttct taaattctac ttttatagtt 12600 

agtctttttt ttagttttaa aacaccaaga acttagtttc gaataaacac acataaacaa 12660 

acaagcttac aaaacaaa atg get gca tat gca get cag ggc tat aag gtg 12711 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val 
15 10 

eta gta etc aac ccc tct gtt get gca aca ctg ggc ttt ggt get tac 12759 
Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr 
15 20 25 

atg tec aag get cat ggg ate gat cct aac ate agg acc ggg gtg aga 12 8 07 
Met Ser Lys Ala His Gly lie Asp Pro Asn He Arg Thr Gly Val Arg 

30 35 40 

aca att acc act ggc age ccc ate acg tac tec acc tac ggc aag ttc 12855 
Thr He Thr Thr Gly Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe 
45 50 55 

ctt gee gac ggc ggg tgc teg ggg ggc get tat gac ata ata att tgt 12 903 
Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp He He He Cys 
60 65 70 75 

gac gag tgc cac tec acg gat gee aca tec ate ttg ggc att ggc act 12 951 
Asp Glu Cys His Ser Thr Asp Ala Thr Ser He Leu Gly He Gly Thr 
80 85 90 

gtc ctt gac caa gca gag act gcg ggg gcg aga ctg gtt gtg etc gee 12 999 
Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala 
95 100 105 

acc gec acc cct ccg ggc tec gtc act gtg ccc cat ccc aac ate gag 13047 
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Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn lie Glu 
110 115 120 

gag gtt get ctg tec acc acc gga gag ate cct ttt tac ggc aag get 13095 
Glu Val Ala Leu Ser Thr Thr Gly Glu lie Pro Phe Tyr Gly Lys Ala 
125 130 135 

ate ccc etc gaa gta ate aag ggg ggg aga cat etc ate ttc tgt cat 13143 
lie Pro Leu Glu Val lie Lys Gly Gly Arg His Leu lie Phe Cys His 
140 145 150 155 

tea aag aag aag tgc gac gaa etc gee gca aag ctg gtc gca ttg ggc 13191 
Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly 
160 165 170 

ate aat gee gtg gee tac tac cgc ggt ctt gac gtg tec gtc ate ccg 13239 
lie Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val lie Pro 
175 180 185 

acc age ggc gat gtt gtc gtc gtg gca acc gat gec etc atg acc ggc 13287 
Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly 
190 195 200 

tat acc ggc gac ttc gac teg gtg ata gac tgc aat acg tgt gtc acc 1333 5 
Tyr Thr Gly Asp Phe Asp Ser Val lie Asp Cys Asn Thr Cys Val Thr 
205 210 215 

cag aca gtc gat ttc age ctt gac cct acc ttc acc att gag aca ate 13383 
Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr lie Glu Thr lie 
220 225 230 235 

acg etc ccc caa gat get gtc tec cgc act caa cgt egg ggc agg act 13431 
Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr 
240 245 250 

ggc agg ggg aag cca ggc ate tac aga ttt gtg gca ccg ggg gag cgc 13479 
Gly Arg Gly Lys Pro Gly lie Tyr Arg Phe Val Ala Pro Gly Glu Arg 
255 260 265 

ccc tec ggc atg ttc gac teg tec gtc etc tgt gag tgc tat gac gca 13527 
Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala 
270 275 280 

ggc tgt get tgg tat gag etc acg ccc gee gag act aca gtt agg eta 13575 
Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu 
285 290 295 

cga gcg tac atg aac acc ccg ggg ctt ccc gtg tgc cag gac cat ctt 13 623 
Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu 
300 305 310 315 

gaa ttt tgg gag ggc gtc ttt aca ggc etc act cat ata gat gee cac 13 671 
Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His lie Asp Ala His 
320 325 330 

ttt eta tec cag aca aag cag agt ggg gag aac ctt cct tac ctg gta 13719 
Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val 
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335 340 345 

gcg tac caa gcc acc gtg tgc get agg get caa gec cct ccc cca teg 13767 
Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser 
350 355 360 

tgg gac cag atg tgg aag tgt ttg att cgc etc aag ccc acc etc cat 13 815 
Trp Asp Gin Met Trp Lys Cys Leu lie Arg Leu Lys Pro Thr Leu His 
365 370 375 

ggg cca aca ccc ctg eta tac aga ctg ggc get gtt cag aat gaa ate 13863 
Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu lie 
380 385 390 395 

acc ctg acg cac cca gtc acc aaa tac ate atg aca tgc atg teg gcc 13911 
Thr Leu Thr His Pro Val Thr Lys Tyr lie Met Thr Cys Met Ser Ala 
400 405 410 

gac ctg gag gtc gtc acg age acc tgg gtg etc gtt ggc ggc gtc ctg 13 95 9 
Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu 
415 420 425 

get get ttg gcc gcg tat tgc ctg tea aca ggc tgc gtg gtc ata gtg 14007 
Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val lie Val 
430 435 440 

ggc agg gtc gtc ttg tec ggg aag ccg gca ate ata cct gac agg gaa 14055 
Gly Arg Val Val Leu Ser Gly Lys Pro Ala lie lie Pro Asp Arg Glu 
445 450 455 

gtc etc tac cga gag ttc gat gag atg gaa gag tgc tct cag cac tta 14103 
Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu 
460 465 470 475 

ccg tac ate gag caa ggg atg atg etc gcc gag cag ttc aag cag aag 14151 
Pro Tyr lie Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys 
480 485 490 

gcc etc ggc etc ctg cag acc gcg tec cgt cag gca gag gtt ate gcc 14199 
Ala Leu Gly Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val lie Ala 
495 500 505 

cct get gtc cag acc aac tgg caa aaa etc gag acc ttc tgg gcg aag 14247 
Pro Ala Val Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys 
510 515 520 

cat atg tgg aac ttc ate agt ggg ata caa tac ttg gcg ggc ttg tea 142 95 
His Met Trp Asn Phe lie Ser Gly lie Gin Tyr Leu Ala Gly Leu Ser 
525 530 535 

acg ctg cct ggt aac ccc gcc att get tea ttg atg get ttt aca get 14343 
Thr Leu Pro Gly Asn Pro Ala lie Ala Ser Leu Met Ala Phe Thr Ala 
540 545 550 555 

get gtc acc age cca eta acc act age caa acc etc etc ttc aac ata 14391 
Ala Val Thr Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn lie 
560 565 570 
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ttg ggg ggg tgg gtg get gec cag etc gec gec ccc ggt gec get act 1443 9 
Leu Gly Gly Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr 
575 580 585 

gec ttt gtg ggc get ggc tta get ggc gee gee ate ggc agt gtt gga 144 87 
Ala Phe Val Gly Ala Gly Leu Ala Gly Ala Ala lie Gly Ser Val Gly 
590 595 600 

ctg ggg aag gtc etc ata gac ate ctt gca ggg tat ggc gcg ggc gtg 14535 
Leu Gly Lys Val Leu He Asp He Leu Ala Gly Tyr Gly Ala Gly Val 
605 610 615 

gcg gga get ctt gtg gca ttc aag ate atg age ggt gag gtc ccc tec 145 83 
Ala Gly Ala Leu Val Ala Phe Lys He Met Ser Gly Glu Val Pro Ser 
620 625 630 635 

acg gag gac ctg gtc aat eta ctg ccc gec ate etc teg ccc gga gec 14631 
Thr Glu Asp Leu Val Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala 

640 645 650 

etc gta gtc ggc gtg gtc tgt gca gca ata ctg cgc egg cac gtt ggc 1467 9 
Leu Val Val Gly Val Val Cys Ala Ala He Leu Arg Arg His Val Gly 
655 660 665 

ccg ggc gag ggg gca gtg cag tgg atg aac egg ctg ata gee ttc gee 14727 
Pro Gly Glu Gly Ala Val Gin Trp Met Asn Arg Leu He Ala Phe Ala 
670 675 680 

tec egg ggg aac cat gtt tec ccc acg cac tac gtg ccg gag age gat 14775 
Ser Arg Gly Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp 
685 690 695 

gca get gee cgc gtc act gee ata etc age age etc act gta ace cag 14823 
Ala Ala Ala Arg Val Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin 
700 705 710 715 

etc ctg agg cga ctg cac cag tgg ata age teg gag tgt ace act cca 14871 
Leu Leu Arg Arg Leu His Gin Trp He Ser Ser Glu Cys Thr Thr Pro 
720 725 730 

tgc tec ggt tec tgg eta agg gac ate tgg gac tgg ata tgc gag gtg 14919 
Cys Ser Gly Ser Trp Leu Arg Asp He Trp Asp Trp He Cys Glu Val 
735 740 745 

ttg age gac ttt aag ace tgg eta aaa get aag etc atg cca cag ctg 14 967 
Leu Ser Asp Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu 
750 755 760 

cct ggg ate ccc ttt gtg tec tgc cag cgc ggg tat aag ggg gtc tgg 15 015 
Pro Gly He Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp 
765 770 775 

c 9 a 999 9 ac 99 c atc at 9 cac act C 9C tgc cac tgt gga get gag ate 15 063 
Arg Gly Asp Gly He Met His Thr Arg Cys His Cys Gly Ala Glu He 
780 785 790 795 

act gga cat gtc aaa aac ggg acg atg agg atc gtc ggt cct agg ace 15111 
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Thr Gly His Val Lys Asn Gly Thr Met Arg lie Val Gly Pro Arg Thr 
800 805 810 

tgc agg aac atg tgg agt ggg acc ttc ccc att aat gcc tac acc acg 15159 
Cys Arg Asn Met Trp Ser Gly Thr Phe Pro lie Asn Ala Tyr Thr Thr 
815 820 825 

ggc ccc tgt acc ccc ctt cct gcg ccg aac tac acg ttc gcg eta tgg 15207 
Gly Pro Cys Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp 
830 835 840 

a 99 gtg tct gca gag gaa tac gtg gag ata agg cag gtg ggg gac ttc 15255 
Arg Val Ser Ala Glu Glu Tyr Val Glu lie Arg Gin Val Gly Asp Phe 
845 850 855 

cac tac gtg acg ggt atg act act gac aat ctt aaa tgc ccg tgc cag 153 03 
His Tyr Val Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin 
860 865 870 875 

gtc cca teg ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc eta cat 15351 
Val Pro Ser Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His 
880 885 890 

agg ttt gcg ccc ccc tgc aag ccc ttg ctg egg gag gag gta tea ttc 153 99 
Arg Phe Ala Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe 
895 900 905 

aga gta gga etc cac gaa tac ccg gta ggg teg caa tta cct tgc gag 15447 
Arg Val Gly Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu 
910 915 920 

ccc gaa ccg gac gtg gcc gtg ttg acg tec atg etc act gat ccc tec 15495 
Pro Glu Pro Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser 
925 930 935 

cat ata aca gca gag gcg gcc ggg cga agg ttg gcg agg gga tea ccc 15543 
His lie Thr Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro 
940 945 950 955 

ccc tct gtg gcc age tec teg get age cag eta tec get cca tct etc 15591 
Pro Ser Val Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu 
960 965 970 

aag gca act tgc acc get aac cat gac tec cct gat get gag etc ata 1563 9 
Lys Ala Thr Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie 
975 980 985 

gag gcc aac etc eta tgg agg cag gag atg ggc ggc aac ate acc agg 15687 
Glu Ala Asn Leu Leu Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg 
990 995 1000 

gtt gag tea gaa aac aaa gtg gtg att ctg gac tec ttc gat ccg ctt 15735 
Val Glu Ser Glu Asn Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu 
1005 1010 1015 

gtg gcg gag gag gac gag egg gag ate tec gta ccc gca gaa ate ctg 157 83 
Val Ala Glu Glu Asp Glu Arg Glu He Ser Val Pro Ala Glu He Leu 
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1020 1025 1030 1035 

egg aag tct egg aga ttc gec cag gec ctg ccc gtt tgg gcg egg ccg 15831 
Arg Lys Ser Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro 
1040 1045 1050 

gac tat aac ccc ccg eta gtg gag acg tgg aaa aag ccc gac tac gaa 15879 
Asp Tyr Asn Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu 
1055 1060 1065 

cca cct gtg gtc cat ggc tgc ccg ctt cca cct cca aag tec cct cct 15927 
Pro Pro Val Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro 
1070 1075 1080 

gtg cct ccg cct egg aag aag egg acg gtg gtc etc act gaa tea acc 15975 
Val Pro Pro Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr 
1085 1090 1095 

eta tct act gee ttg gee gag etc gee acc aga age ttt ggc age tec 16023 
Leu Ser Thr Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser 
1100 1105 1110 1115 

tea act tec ggc att acg ggc gac aat acg aca aca tec tct gag ccc 16071 
Ser Thr Ser Gly lie Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro 
1120 1125 1130 

gee cct tct ggc tgc ccc ccc gac tec gac get gag tec tat tec tec 16119 
Ala Pro Ser Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser 
1135 1140 1145 

atg ccc ccc ctg gag ggg gag cct ggg gat ccg gat ctt age gac ggg 16167 
Met Pro Pro Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly 
1150 1155 1160 

tea tgg tea acg gtc agt agt gag gee aac gcg gag gat gtc gtg tgc 16215 
Ser Trp Ser Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys 
1165 1170 1175 

tgc tea atg tct tac tct tgg aca ggc gca etc gtc acc ccg tgc gec 16263 
Cys Ser Met Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala 
1180 1185 1190 1195 

gcg gaa gaa cag aaa ctg ccc ate aat gca eta age aac teg ttg eta 16311 
Ala Glu Glu Gin Lys Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu 
1200 1205 1210 

cgt cac cac aat ttg gtg tat tec acc acc tea cgc agt get tgc caa 16359 
Arg His His Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin 
1215 1220 1225 

agg cag aag aaa gtc aca ttt gac aga ctg caa gtt ctg gac age cat 16407 
Arg Gin Lys Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His 
1230 1235 1240 

tac cag gac gta etc aag gag gtt aaa gca gcg gcg tea aaa gtg aag 16455 
Tyr Gin Asp Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys 
1245 1250 1255 
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get aac ttg eta tec gta gag gaa get tgc age ctg acg ccc cca cac 16503 
Ala Asn Leu Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His 
1260 1265 1270 1275 

tea gec aaa tec aag ttt ggt tat ggg gca aaa gac gtc cgt tgc cat 16551 
Ser Ala Lys Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His 
1280 1285 1290 

gee aga aag gec gta acc cac ate aac tec gtg tgg aaa gac ctt ctg 16599 
Ala Arg Lys Ala Val Thr His lie Asn Ser Val Trp Lys Asp Leu Leu 
1295 1300 1305 

gaa gac aat gta aca cca ata gac act acc ate atg get aag aac gag 1664 7 
Glu Asp Asn Val Thr Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu 
1310 1315 1320 

gtt ttc tgc gtt cag cct gag aag ggg ggt cgt aag cca get cgt etc 16695 
Val Phe Cys Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu 
1325 1330 1335 

ate gtg ttc ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg get ttg 16743 
lie Val Phe Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu 
1340 1345 1350 1355 

tac gac gtg gtt aca aag etc ccc ttg gee gtg atg gga age tec tac 167 91 
Tyr Asp Val Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr 
1360 1365 1370 

gga ttc caa tac tea cca gga cag egg gtt gaa ttc etc gtg caa gcg 1683 9 
Gly Phe Gin Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala 
1375 1380 1385 

tgg aag tec aag aaa acc cca atg ggg ttc teg tat gat acc cgc tgc 16887 
Trp Lys Ser Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys 
1390 1395 1400 

ttt gac tec aca gtc act gag age gac ate cgt acg gag gag gca ate 16935 
Phe Asp Ser Thr Val Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie 
1405 1410 1415 

tac caa tgt tgt gac etc gac ccc caa gee cgc gtg gee ate aag tec 16983 
Tyr Gin Cys Cys Asp Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser 
1420 1425 1430 1435 

etc acc gag agg ctt tat gtt ggg ggc cct ctt acc aat tea agg ggg 17031 
Leu Thr Glu Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly 
1440 1445 1450 

gag aac tgc ggc tat cgc agg tgc cgc gcg age ggc gta ctg aca act 17 079 
Glu Asn Cys Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr 
1455 1460 1465 

age tgt ggt aac acc etc act tgc tac ate aag gec egg gca gee tgt 1712 7 
Ser Cys Gly Asn Thr Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys 
1470 1475 1480 

cga gec gca ggg etc cag gac tgc acc atg etc gtg tgt ggc gac gac 17175 
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Arg Ala Ala Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp 
1485 1490 1495 

tta gtc gtt ate tgt gaa age gcg ggg gtc cag gag gac gcg gcg age 17223 
Leu Val Val lie Cys Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser 
1500 1505 1510 1515 

ctg aga gec ttc acg gag get atg acc agg tac tec gec ccc cct ggg 17271 
Leu Arg Ala Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly 
1520 1525 1530 

gac ccc cca caa cca gaa tac gac ttg gag etc ata aca tea tgc tec 17319 
Asp Pro Pro Gin Pro Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser 
1535 1540 1545 

tec aac gtg tea gtc gee cac gac ggc get gga aag agg gtc tac tac 173 67 
Ser Asn Val Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr 
1550 1555 1560 

etc acc cgt gac cct aca acc ccc etc gcg aga get gcg tgg gag aca 17415 
Leu Thr Arg Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr 
1565 1570 1575 

gca aga cac act cca gtc aat tec tgg eta ggc aac ata ate atg ttt 17463 
Ala Arg His Thr Pro Val Asn Ser Trp Leu Gly Asn lie lie Met Phe 
1580 1585 1590 1595 

gec ccc aca ctg tgg gcg agg atg ata ctg atg acc cat ttc ttt age 17511 
Ala Pro Thr Leu Trp Ala Arg Met lie Leu Met Thr His Phe Phe Ser 
1600 1605 1610 

gtc ctt ata gec agg gac cag ctt gaa cag gee etc gat tgc gag ate 17559 
Val Leu lie Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie 
1615 1620 1625 

tac ggg gee tgc tac tec ata gaa cca ctg gat eta cct cca ate att 17607 
Tyr Gly Ala Cys Tyr Ser lie Glu Pro Leu Asp Leu Pro Pro lie lie 
1630 1635 1640 

caa aga etc cat ggc etc age gca ttt tea etc cac agt tac tct cca 17655 
Gin Arg Leu His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro 
1645 1650 1655 

ggt gaa ate aat agg gtg gec gca tgc etc aga aaa ctt ggg gta ccg 17703 
Gly Glu lie Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro 
1660 1665 1670 1675 

ccc ttg cga get tgg aga cac egg gec egg age gtc cgc get agg ctt 17751 
Pro Leu Arg Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu 
1680 1685 1690 

ctg gee aga gga ggc agg get gec ata tgt ggc aag tac etc ttc aac 177 99 
Leu Ala Arg Gly Gly Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn 
1695 1700 1705 

tgg gca gta aga aca aag etc aaa etc act cca ata gcg gee get ggc 17847 
Trp Ala Val Arg Thr Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly 
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1710 1715 1720 

cag ctg gac ttg tec ggc tgg ttc acg get ggc tac age ggg gga gac 17895 
Gin Leu Asp Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp 
1725 1730 1735 

att tat cac age gtg tct cat gec egg ccc cgc tgg ate tgg ttt tgc 17 943 
lie Tyr His Ser Val Ser His Ala Arg Pro Arg Trp lie Trp Phe Cys 
1740 1745 1750 1755 

eta etc ctg ctt get gca ggg gta ggc ate tac etc etc ccc aac cga 17991 
Leu Leu Leu Leu Ala Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg 
1760 1765 1770 

atg age acg aat cct aaa cct caa aga aag acc aaa cgt aac ace aac 1803 9 
Met Ser Thr Asn Pro Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn 
1775 1780 1785 

egg egg ccg cag gac gtc aag ttc ccg ggt ggc ggt cag ate gtt ggt 18087 
Arg Arg Pro Gin Asp Val Lys Phe Pro Gly Gly Gly Gin lie Val Gly 
1790 1795 1800 

gga gtt tac ttg ttg ccg cgc agg ggc cct aga ttg ggt gtg cgc gcg 1813 5 
Gly Val Tyr Leu Leu Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala 
1805 1810 1815 

acg aga aag act tec gag egg teg caa cct cga ggt aga cgt cag cct 18183 
Thr Arg Lys Thr Ser Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro 
1820 1825 1830 1835 

ate ccc aag get cgt egg ccc gag ggc agg acc tgg get cag ccc ggg 18231 
lie Pro Lys Ala Arg Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly 
1840 1845 1850 

tac cct tgg ccc etc tat ggc aat gag ggc tgc ggg tgg gcg gga tgg 18279 
Tyr Pro Trp Pro Leu Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp 
1855 1860 1865 

etc ctg tct ccc cgt ggc tct egg cct age tgg ggc ccc aca gac ccc 18327 
Leu Leu Ser Pro Arg Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro 
1870 1875 1880 

egg cgt agg teg cgc aat ttg ggt aag taatagtcga ctttgttccc 18374 
Arg Arg Arg Ser Arg Asn Leu Gly Lys 
1885 1890 

actgtacttt tagctegtae aaaatacaat atacttttca tttctccgta aacaacatgt 18434 

tttcccatgt aatatccttt tctatttttc gttccgttac caactttaca catactttat 18494 

atagctattc acttctatac actaaaaaac taagacaatt ttaattttgc tgcctgccat 18554 

atttcaattt gttataaatt cctataattt atcctattag tagctaaaaa aagatgaatg 18614 

tgaatcgaat cctaagagaa ttggatctga tccacaggac gggtgtggtc gecatgateg 18674 

cgtagtcgat agtggctcca agtagcgaag cgagcaggac tgggeggegg ecaaageggt 18734 
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cggacagtgc tccgagaacg ggtgcgcata gaaattgcat caacgcatat agcgctagca 18794 
gcacgccata gtgactggcg atgctgtcgg aatggacgat atcccgcaag aggcccggca 18854 
gtaccggcat aaccaagcct atgcctacag catccagggt gacggtgccg aggatgacga 18914 
tgagcgcatt gttagatttc atacacggtg cctgactgcg ttagcaattt aactgtgata 18 974 
aactaccgca ttaaagcttt ttctttccaa tttttttttt ttcgtcatta taaaaatcat 19034 
tacgaccgag attcccgggt aataactgat ataattaaat tgaagctcta atttgtgagt 19094 
ttagtataca tgcatttact tataatacag ttttttagtt ttgctggccg catcttctca 19154 
aatatgcttc ccagcctgct tttctgtaac gttcaccctc taccttagca tcccttccct 19214 
ttgcaaatag tcctcttcca acaataataa tgtcagatcc tgtagagacc acatcatcca 192 74 
cggttctata ctgttgaccc aatgcgtctc ccttgtcatc taaacccaca ccgggtgtca 19334 
taatcaacca atcgtaacct tcatctcttc cacccatgtc tctttgagca ataaagccga 193 94 
taacaaaatc tttgtcgctc ttcgcaatgt caacagtacc cttagtatat tctccagtag 19454 
atagggagcc cttgcatgac aattctgcta acatcaaaag gcctctaggt tcctttgtta 19514 
cttcttctgc cgcctgcttc aaaccgctaa caatacctgg gcccaccaca ccgtgtgcat 19574 
tcgtaatgtc tgcccattct gctattctgt atacacccgc agagtactgc aatttgactg 19634 
tattaccaat gtcagcaaat tttctgtctt cgaagagtaa aaaattgtac ttggcggata 19694 
atgcctttag cggcttaact gtgccctcca tggaaaaatc agtcaagata tccacatgtg 19754 
tttttagtaa acaaattttg ggacctaatg cttcaactaa ctccagtaat tccttggtgg 19814 
tacgaacatc caatgaagca cacaagtttg tttgcttttc gtgcatgata ttaaatagct 19874 
tggcagcaac aggactagga tgagtagcag cacgttcctt atatgtagct ttcgacatga 19934 
tttatcttcg tttcctgcag gtttttgttc tgtgcagttg ggttaagaat actgggcaat 19994 
ttcatgtttc ttcaacacta catatgcgta tatataccaa tctaagtctg tgctccttcc 20054 
ttcgttcttc cttctgttcg gagattaccg aatcaaaaaa atttcaagga aaccgaaatc 2 0114 
aaaaaaaaga ataaaaaaaa aatgatgaat tgaaaagctt atcgat 2 0160 

<210> 13 
<2ll> 1892 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd. delta. NS3NS5 .pj .core!21 
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<400> 13 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
15 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg Thr lie Thr Thr Gly 
35 40 45 

Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp lie lie lie Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser lie Leu Gly lie Gly Thr Val Leu Asp Gin Ala 
85 90 95 

Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn lie Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu lie Pro Phe Tyr Gly Lys Ala lie Pro Leu Glu Val 
130 135 140 

lie Lys Gly Gly Arg His Leu lie Phe Cys His Ser Lys Lys Lys Cys 
145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly lie Asn Ala Val Ala 
165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val lie Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val lie Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr lie Glu Thr lie Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 
245 250 255 

Gly lie Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 
260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 
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Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 

Val Phe Thr Gly Leu Thr His lie Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu lie Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu lie Thr Leu Thr His Pro 
385 390 395 400 

Val Thr Lys Tyr lie Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val lie Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala lie lie Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr lie Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val lie Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 525 

lie Ser Gly lie Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala lie Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn lie Leu Gly Gly Trp Val 
565 570 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 
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He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 

His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp He Trp Asp Trp He Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly He 
770 775 780 

Met His Thr Arg Cys His Cys Gly Ala Glu He Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg He Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 

Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 

Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 
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Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His lie Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val He Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu He Ser Val Pro Ala Glu He Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly He 
105 1110 1115 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 



Gly Glu Pro Gly Asp Pro Asp Leu 
1155 1160 

Ser Ser Glu Ala Asn Ala Glu Asp 
1170 1175 

Ser Trp Thr Gly Ala Leu Val Thr 
185 1190 



Ser Asp Gly Ser Trp Ser Thr Val 
1165 

Val Val Cys Cys Ser Met Ser Tyr 
1180 

Pro Cys Ala Ala Glu Glu Gin Lys 
1195 1200 



Leu Pro He Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 
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Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 

Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu lie Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val lie Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 
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Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn lie lie Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 

Ala Arg Met lie Leu Met Thr His Phe Phe Ser Val Leu lie Ala Arg 
1605 1610 1615 

Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser lie Glu Pro Leu Asp Leu Pro Pro lie lie Gin Arg Leu His Gly 

1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu lie Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 1680 

Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 

Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly Gin Leu Asp Leu Ser 

1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp lie Tyr His Ser Val 
1730 " 1735 1740 

Ser His Ala Arg Pro Arg Trp lie Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 

Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg Met Ser Thr Asn Pro 
1765 1770 1775 

Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn Arg Arg Pro Gin Asp 
1780 1785 1790 

Val Lys Phe Pro Gly Gly Gly Gin lie Val Gly Gly Val Tyr Leu Leu 
1795 1800 1805 

Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala Thr Arg Lys Thr Ser 
1810 1815 1820 
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Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro lie Pro Lys Ala Arg 
825 1830 1835 1840 

Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly Tyr Pro Trp Pro Leu 
1845 1850 1855 

Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp Leu Leu Ser Pro Arg 
1860 1865 1870 

Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro Arg Arg Arg Ser Arg 
1875 1880 1885 

Asn Leu Gly Lys 
1890 



<210> 14 
<211> 20316 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd. delta. NS3NS5 .pj .corel73 

<220> 
<221> CDS 

<222> (12679) . . (18510) 
<400> 14 



atcgatccta 


ccccttgcgc 


taaagaagta 


tatgtgccta 


ctaacgcttg 


tctttgtctc 


60 


tgtcactaaa 


cactggatta 


ttactcccag 


atacttattt 


tggactaatt 


taaatgattt 


120 


cggatcaacg 


ttcttaatat 


cgctgaatct 


tccacaattg 


atgaaagtag 


ctaggaagag 


180 


gaattggtat 


aaagtttttg 


tttttgtaaa 


tctcgaagta 


tactcaaacg 


aatttagtat 


240 


tttctcagtg 


atctcccaga 


tgctttcacc 


ctcacttaga 


agtgctttaa 


gcattttttt 


300 


actgtggcta 


tttcccttat 


ctgcttcttc 


cgatgattcg 


aactgtaatt 


gcaaactact 


360 


tacaatatca 


gtgatatcag 


attgatgttt 


ttgtccatag 


taaggaataa 


ttgtaaattc 


420 


ccaagcagga 


atcaatttct 


ttaatgaggc 


ttccagaatt 


gttgcttttt 


gcgtcttgta 


480 


tttaaactgg 


agtgatttat 


tgacaatatc 


gaaactcagc 


gaattgctta 


tgatagtatt 


540 


atagctcatg 


aatgtggctc 


tcttgattgc 


tgttccgtta 


tgtgtaatca 


tccaacataa 


600 


ataggttagt 


tcagcagcac 


ataatgctat 


tttctcacct 


gaaggtcttt 


caaacctttc 


660 


cacaaactga 


cgaacaagca 


ccttaggtgg 


tgttttacat 


aatatatcaa 


attgtggcat 


720 


gcttagcgcc 


gatcttgtgt 


gcaattgata 


tctagtttca 


actactctat 


ttatcttgta 


780 


tcttgcagta 


ttcaaacacg 


ctaactcgaa 


aaactaactt 


taattgtcct 


gtttgtctcg 


840 



113 




cgttctttcg aaaaatgcac cggccgcgca 
cggtatcttc atttcatatt ttaaaaatgc 
ggcccgcagg ttcgttttgc ggtactatct 
tgcacagatt ttataatgta ataagcaaga 
agaaaaccaa aatggacgac attgaaacag 
cttatagcgt ctgggatgta tgtcggctgt 
ttgatataga gagtaaacgt aagtctgatg 
ccatggaatc tctcacaacc ggtaggccgt 
gcgtatcttc tgactccagt gctgaggtaa 
ggtttgattc gattggaaat ggtatgctct 
atttgatgct acagaataac aagctgttag 
ctataataat aggaagattg cccgagaaag 
gaaaaatgga ttgtacacag ttattagtcc 
agctcgtaag cgtcgttacc caattgctta 
taataggtga tttattcatc ccggaatctc 
tggcggcaga gaatcgttta cagcaaaaaa 
accatgctaa tacaaatgaa gaagttccct 
caagaggagc atataaatta caaaacacca 
aaaaaaggag agtagcaacg agggtaaggg 
gatccaatat caaaggaaat gatagcattg 
cagcatatag aacagctaaa gggtagtgct 
gggataatat cacaggaggt actagactac 
gtacgcattt aagcataaac acgcactatg 
caacacgcag atataggtgc gacgtgaaca 
ttttcggaag cgctcgtttt cggaaacgct 
ctagaaagta taggaacttc agagcgcttt 
ttcaaaaaac caaaaacgca ccggactgta 
tccacaaaca ttgctcaaaa gtatctcttt 
aacctaccca tccacctttc gctccttgaa 




ttatttgtac tgcgaaaata attggtactg 900 
acctttgctg cttttcctta atttttagac 960 
tgtgataaaa agttgttttg acatgtgatc 1020 
atacattatc aaacgaacaa tactggtaaa 108 0 
ccaagaatct gacggtaaaa gcacgtacag 1140 
ttattgaaat gattgctcct gatgtagata 1200 
agctactctt tccaggatat gtcataaggc 12 60 
atggtcttga ttctagcgca gaagattcca 132 0 
ttttgcctgc tgcgaagatg gttaaggaaa 1380 
cttcacaaga agcaagtcag gctgccatag 1440 
acaatagaaa gcaactatac aaatctattg 1500 
acaagaagag agctaccgaa atgctcatga 1560 
caccagctcc aacggaagaa gatgttatga 162 0 
ctttagttcc accagatcgt caagctgctt 1680 
taaaggatat attcaatagt ttcaatgaac 1740 
agagtgagtt ggaaggaagg actgaagtga 1800 
ccaggcgaac aagaagtaga gacacaaatg 1860 
tcactgaggg ccctaaagcg gttcccacga 192 0 
gcagaaaatc acgtaatact tctagggtat 1980 
aaggatgaga ctaatccaat tgaggagtgg 2 040 
gaaggaagca tacgataccc cgcatggaat 2100 
ctttcatcct acataaatag acgcatataa 2160 
ccgttcttct catgtatata tatatacagg 2220 
gtgagctgta tgtgcgcagc tcgcgttgca 22 80 
ttgaagttcc tattccgaag ttcctattct 2340 
tgaaaaccaa aagcgctctg aagacgcact 24 00 
acgagctact aaaatattgc gaataccgct 24 60 
gctatatatc tctgtgctat atccctatat 2520 
cttgcatcta aactcgacct ctacatcaac 2580 
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aggcttccaa 


tgctcttcaa 


attttactgt 


ctcttcataa 


tgtaagctta 


tctttatcga 


ctttacggtt 


ccctgagatt 


gaattagttc 


ctttgtacga 


cgaattttga 


ggttcgccat 


tattatctcc 


gcctcagttt 


gatcttccgc 


tatttcaccc 


cacaatcctt 


catccgcctc 


atgttgtaca 


ttgtttagtt 


cacgagaagg 


tatatgacct 


ttatcctgtt 


ctctttccac 


gcacctaata 


acattcttca 


aggcggagaa 


tgaaaacgtg 


agaatgaatt 


tagtattatt 


tcgaagataa 


gagaagaatg 


cagtgacctt 


aaaaaatacg 


cctttaggcc 


ttctgatacc 


attaatatct 


aaaccctctc 


cgatggtggc 


aaactgtgat 


aattctgggt 


gatttatgat 


aggatcaggc 


caatccagtt 


ctttttcaat 


tccaacaaat 


gcaaatgcta 


acgttttgta 


cccccttgtc 


gtctcgatta 


cacacctact 


cataatacat 


tgcttaatac 


aagcaagcag 


cattacagct 


gatgtcattg 


tatatcagcg 


tcgcggtttt 


tataaacaaa 


actttcgtta 


ttggaaattc 


gggaaaaagt 


agagcaacgc 


ttaacttcga 


gaagggatta 


aggctaattt 


ccattgaatg 


ccttataaaa 


cagctataga 


tttgtcaaag 


cttactgatg 


atgatgtgtc 


tgacattata 


aagctggcac 


ttagaattcc 


tctactgtac 


gatacacttc 


cgctcaggtc 


ttgttactct 


attgatccag 


ctcagcaaag 


tgtagtaaaa 


ctagctagac 


cgagaaagag 


gctgccatca 


ttattatccg 


atgtgacgct 




caagtagacc catacggctg taatatgctg 2 64 0 
atcgtgtgaa aaactactac cgcgataaac 270 0 
ctttagtata tgatacaaga cacttttgaa 2760 
cctctggcta tttccaatta tcctgtcggc 2820 
ttcagactgc catttttcac ataatgaatc 2880 
cgcatcttgt tccgttaaac tattgacttc 2940 
gtcctcttca ggcggtagct cctgatctcc 3000 
aaacttagaa atgtattcat gaattatgga 3060 
gtttgggcca gatgcccaat atgcttgaca 312 0 
gtgatattct gaggcaattt tattataatc 3180 
tgtattgaca aatggagatt ccatgtatct 324 0 
ctttcccctg cggtttagcg tgccttttac 3300 
ctttaactga ctaataaatg caaccgatat 3360 
tcgatcgaca attgtattgt acactagtgc 342 0 
taccggtgtg tcgtctgtat tcagtacatg 3480 
tttcttataa ttgtcaggaa ctggaaaagt 354 0 
ttcatcgtac accataggtt ggaagtgctg 3 600 
tctctcgcca ttcatatttc agttattttc 3660 
ctgtaaaaat ctatctgtta cagaaggttt 3 72 0 
cgaaatcgag caatcacccc agctgcgtat 3780 
gagttgcatt ttttacacca taatgcatga 3 840 
cactagtatg tttcaaaaac ctcaatctgt 3 900 
ttgcatagaa gagttagcta ctcaatgctt 3 960 
tactttcagg cgggtctgta gtaaggagaa 4 02 0 
acggactata gactatacta gtatactccg 4080 
cttgtccttt aacgaggcct taccactctt 4140 
gcagtgtgat ctaagattct atcttcgcga 4200 
actagaaatg caaaaggcac ttctacaatg 42 60 
gcattttttt tttttttttt tttttttttt 4320 
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tttttttttt tttttttttt ttttttggta 
agcaaggatt ttcttaactt cttcggcgac 
accacctaaa tcaccagttc tgatacctgc 
ggctttacct tcttcaggca agttcaatga 
agtggcgata gggttgacct tattctttgg 
gtacaaacca aatgcggtgt tcttgtctgg 
acccaaggag cctgggataa cggaggcttc 
ggtgattata ataccattta ggtgggttgg 
aatcaattga tgttgaactt tcaatgtagg 
ttttctccat aatcttgaag aggccaaaac 
tggtggctca tgttgtaggg ccatgaaagc 
aacggtgtat tgttcactat cccaagcgac 
aaagtaaata cctcccacta attctctaac 
tggcttgatt ggagataagt ctaaaagaga 
ggcgtacaat tgaagttctt tacggatttt 
ggtaccccat ttaggaccac ccacagcacc 
ttccagcgcc tcatctggaa gtggaacacc 
atgattttcg aaatcgaact tgacattgga 
aatggcttcg gctgtgattt cttgaccaac 
aggggcagac attacaatgg tatatccttg 
aaaaaaaaaa atgcagcttc tcaatgatat 
tatccgacaa actgttttac agatttacga 
acatccgaac ctgggagttt tccctgaaac 
tatagtctag cgctttacgg aagacaatgt 
atctattgca taggtaatct tgcacgtcgc 
tgcacttcaa tagcatatct ttgttaacga 
atgcaacgcg agagcgctaa tttttcaaac 
gaaatgcaac gcgaaagcgc tattttacca 
caaaaatgca acgcgagagc gctaattttt 




caaatatcat aaaaaaagag aatcttttta 43 80 
agcatcaccg acttcggtgg tactgttgga 444 0 
atccaaaacc tttttaactg catcttcaat 4500 
caatttcaac atcattgcag cagacaagat 4560 
caaatctgga gcggaaccat ggcatggttc 4620 
caaagaggcc aaggacgcag atggcaacaa 4680 
atcggagatg atatcaccaa acatgttgct 474 0 
gttcttaact aggatcatgg cggcagaatc 4800 
gaattcgttc ttgatggttt cctccacagt 4860 
attagcttta tccaaggacc aaataggcaa 492 0 
ggccattctt gtgattcttt gcacttctgg 4980 
accatcacca tcgtcttcct ttctcttacc 5040 
aacaacgaag tcagtacctt tagcaaattg 5100 
gtcggatgca aagttacatg gtcttaagtt 5160 
tagtaaacct tgttcaggtc taacactacc 5220 
taacaaaacg gcatcagcct tcttggaggc 52 80 
tgtagcatcg atagcagcac caccaattaa 5340 
acgaacatca gaaatagctt taagaacctt 5400 
gtggtcacct ggcaaaacga cgatcttctt 5460 
aaatatatat aaaaaaaaaa aaaaaaaaaa 552 0 
tcgaatacgc tttgaggaga tacagcctaa 55 80 
tcgtacttgt tacccatcat tgaattttga 5640 
agatagtata tttgaacctg tataataata 5700 
atgtatttcg gttcctggag aaactattgc 5760 
atccccggtt cattttctgc gtttccatct 5820 
agcatctgtg cttcattttg tagaacaaaa 5 880 
aaagaatctg agctgcattt ttacagaaca 5940 
acgaagaatc tgtgcttcat ttttgtaaaa 6000 
caaacaaaga atctgagctg catttttaca 6060 
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gaacagaaat gcaacgcgag agcgctattt 
ttctacaaaa atgcatcccg agagcgctat 
tttctccttt gtgcgctcta taatgcagtc 
taaggttaga agaaggctac tttggtgtct 
cacttcccgc gtttactgat tactagcgaa 
atccccgatt atattctata ccgatgtgga 
gcgttgatga ttcttcattg gtcagaaaat 
atactacgta taggaaatgt ttacattttc 
tcttactaca atttttttgt ctaaagagta 
gtcgagttta gatgcaagtt caaggagcga 
agcacagaga tatatagcaa agagatactt 
aatattttag tagctcgtta cagtccggtg 
gagcgctttt ggttttcaaa agcgctctga 
tcggaatagg aacttcaaag cgtttccgaa 
tgcgcacata cagctcactg ttcacgtcgc 
tatacatgag aagaacggca tagtgcgtgt 
atttatgtag gatgaaaggt agtctagtac 
gtatcgtatg cttccttcag cactaccctt 
tggattagtc tcatccttca atgctatcat 
ccgagaaact agtgcgaagt agtgatcagg 
cctggccacg gcagaagcac gcttatcgct 
taggcccttc attgaaagaa atgaggtcat 
attttttata gcaaagattg aataaggcgc 
gactaagtta tcttttaata attggtattc 
atttactcgt tttaggactg gttcagaatt 
atcgatgata agctgtcaaa catgagaatt 
tatttttata ggttaatgtc atgataataa 
ggggaaatgt gcgcggaacc cctatttgtt 
cgctcatgag acaataaccc tgataaatgc 



taccaacaaa gaatctatac ttcttttttg 6120 
ttttctaaca aagcatctta gattactttt 6180 
tcttgataac tttttgcact gtaggtccgt 624 0 
attttctctt ccataaaaaa agcctgactc 6300 
gctgcgggtg cattttttca agataaaggc 63 60 
ttgcgcatac tttgtgaaca gaaagtgata 642 0 
tatgaacggt ttcttctatt ttgtctctat 6480 
gtattgtttt cgattcactc tatgaatagt 6540 
atactagaga taaacataaa aaatgtagag 6600 
aaggtggatg ggtaggttat atagggatat 6660 
ttgagcaatg tttgtggaag cggtattcgc 672 0 
cgtttttggt tttttgaaag tgcgtcttca 6780 
agttcctata ctttctagag aataggaact 6840 
aacgagcgct tccgaaaatg caacgcgagc 690 0 
acctatatct gcgtgttgcc tgtatatata 6960 
ttatgcttaa atgcgtactt atatgcgtct 7020 
ctcctgtgat attatcccat tccatgcggg 7080 
tagctgttct atatgctgcc actcctcaat 7140 
ttcctttgat attggatcat atgcatagta 72 00 
tattgctgtt atctgatgag tatacgttgt 72 60 
ccaatttccc acaacattag tcaactccgt 7320 
caaatgtctt ccaatgtgag attttgggcc 73 80 
atttttcttc aaagctttat tgtacgatct 7440 
ctgtttattg cttgaagaat tgccggtcct 7500 
cctcaaaaat tcatccaaat atacaagtgg 756 0 
cttgaagacg aaagggcctc gtgatacgcc 7 62 0 
tggtttctta gacgtcaggt ggcacttttc 768 0 
tatttttcta aatacattca aatatgtatc 7740 
ttcaataata ttgaaaaagg aagagtatga 7800 
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gtattcaaca tttccgtgtc gcccttattc ccttttttgc ggcattttgc cttcctgttt 7860 
ttgctcaccc agaaacgctg gtgaaagtaa aagatgctga agatcagttg ggtgcacgag 792 0 
tgggttacat cgaactggat ctcaacagcg gtaagatcct tgagagtttt cgccccgaag 7980 
aacgttttcc aatgatgagc acttttaaag ttctgctatg tggcgcggta ttatcccgtg 8040 
ttgacgccgg gcaagagcaa ctcggtcgcc gcatacacta ttctcagaat gacttggttg 8100 
agtactcacc agtcacagaa aagcatctta cggatggcat gacagtaaga gaattatgca 8160 
gtgctgccat aaccatgagt gataacactg cggccaactt acttctgaca acgatcggag 822 0 
gaccgaagga gctaaccgct tttttgcaca acatggggga tcatgtaact cgccttgatc 8280 
gttgggaacc ggagctgaat gaagccatac caaacgacga gcgtgacacc acgatgcctg 834 0 
cagcaatggc aacaacgttg cgcaaactat taactggcga actacttact ctagcttccc 8400 
ggcaacaatt aatagactgg atggaggcgg ataaagttgc aggaccactt ctgcgctcgg 84 60 
cccttccggc tggctggttt attgctgata aatctggagc cggtgagcgt gggtctcgcg 852 0 
gtatcattgc agcactgggg ccagatggta agccctcccg tatcgtagtt atctacacga 8580 
cggggagtca ggcaactatg gatgaacgaa atagacagat cgctgagata ggtgcctcac 8640 
tgattaagca ttggtaactg tcagaccaag tttactcata tatactttag attgatttaa 870 0 
aacttcattt ttaatttaaa aggatctagg tgaagatcct ttttgataat ctcatgacca 8760 
aaatccctta acgtgagttt tcgttccact gagcgtcaga ccccgtagaa aagatcaaag 882 0 
gatcttcttg agatcctttt tttctgcgcg taatctgctg cttgcaaaca aaaaaaccac 8880 
cgctaccagc ggtggtttgt ttgccggatc aagagctacc aactcttttt ccgaaggtaa 894 0 
ctggcttcag cagagcgcag ataccaaata ctgtccttct agtgtagccg tagttaggcc 9000 
accacttcaa gaactctgta gcaccgccta catacctcgc tctgctaatc ctgttaccag 9060 
tggctgctgc cagtggcgat aagtcgtgtc ttaccgggtt ggactcaaga cgatagttac 9120 
cggataaggc gcagcggtcg ggctgaacgg ggggttcgtg cacacagccc agcttggagc 9180 
gaacgaccta caccgaactg agatacctac agcgtgagct atgagaaagc gccacgcttc 924 0 
ccgaagggag aaaggcggac aggtatccgg taagcggcag ggtcggaaca ggagagcgca 93 00 
cgagggagct tccaggggga aacgcctggt atctttatag tcctgtcggg tttcgccacc 9360 
tctgacttga gcgtcgattt ttgtgatgct cgtcaggggg gcggagccta tggaaaaacg 942 0 
ccagcaacgc ggccttttta cggttcctgg ccttttgctg gccttttgct cacatgttct 9480 
ttcctgcgtt atcccctgat tctgtggata accgtattac cgcctttgag tgagctgata 9540 
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ccgctcgccg cagccgaacg accgagcgca gcgagtcagt gagcgaggaa gcggaagagc 9600 
gcctgatgcg gtattttctc cttacgcatc tgtgcggtat ttcacaccgc atatggtgca 9660 
ctctcagtac aatctgctct gatgccgcat agttaagcca gtatacactc cgctatcgct 9720 
acgtgactgg gtcatggctg cgccccgaca cccgccaaca cccgctgacg cgccctgacg 9780 
ggcttgtctg ctcccggcat ccgcttacag acaagctgtg accgtctccg ggagctgcat 984 0 
gtgtcagagg ttttcaccgt catcaccgaa acgcgcgagg cagctgcggt aaagctcatc 9900 
agcgtggtcg tgaagcgatt cacagatgtc tgcctgttca tccgcgtcca gctcgttgag 9960 
tttctccaga agcgttaatg tctggcttct gataaagcgg gccatgttaa gggcggtttt 10020 
ttcctgtttg gtcactgatg cctccgtgta agggggattt ctgttcatgg gggtaatgat 1008 0 
accgatgaaa cgagagagga tgctcacgat acgggttact gatgatgaac atgcccggtt 10140 
actggaacgt tgtgagggta aacaactggc ggtatggatg cggcgggacc agagaaaaat 10200 
cactcagggt caatgccagc gcttcgttaa tacagatgta ggtgttccac agggtagcca 10260 
gcagcatcct gcgatgcaga tccggaacat aatggtgcag ggcgctgact tccgcgtttc 1032 0 
cagactttac gaaacacgga aaccgaagac cattcatgtt gttgctcagg tcgcagacgt 10380 
tttgcagcag cagtcgcttc acgttcgctc gcgtatcggt gattcattct gctaaccagt 1044 0 
aaggcaaccc cgccagccta gccgggtcct caacgacagg agcacgatca tgcgcacccg 10500 
tggccaggac ccaacgctgc ccgagatgcg ccgcgtgcgg ctgctggaga tggcggacgc 10560 
gatggatatg ttctgccaag ggttggtttg cgcattcaca gttctccgca agaattgatt 10620 
ggctccaatt cttggagtgg tgaatccgtt agcgaggtgc cgccggcttc cattcaggtc 10680 
gaggtggccc ggctccatgc accgcgacgc aacgcgggga ggcagacaag gtatagggcg 10740 
gcgcctacaa tccatgccaa cccgttccat gtgctcgccg aggcggcata aatcgccgtg 10800 
acgatcagcg gtccaatgat cgaagttagg ctggtaagag ccgcgagcga tccttgaagc 10860 
tgtccctgat ggtcgtcatc tacctgcctg gacagcatgg cctgcaacgc gggcatcccg 10 92 0 
atgccgccgg aagcgagaag aatcataatg gggaaggcca tccagcctcg cgtcgcgaac 10980 
gccagcaaga cgtagcccag cgcgtcggcc gccatgccgg cgataatggc ctgcttctcg 11040 
ccgaaacgtt tggtggcggg accagtgacg aaggcttgag cgagggcgtg caagattccg 1110 0 
aataccgcaa gcgacaggcc gatcatcgtc gcgctccagc gaaagcggtc ctcgccgaaa 11160 
atgacccaga gcgctgccgg cacctgtcct acgagttgca tgataaagaa gacagtcata 11220 
agtgcggcga cgatagtcat gccccgcgcc caccggaagg agctgactgg gttgaaggct 112 8 0 
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ctcaagggca 


tcggtcgagg 


atccttcaat 


atgegcacat 


acgctgttat 


gttcaaggtc 


11340 


ccttcgttta 


agaacgaaag 


cggtcttcct 


tttgagggat 


gtttcaagtt 


gttcaaatct 


11400 


atcaaatttg 


caaatcccca 


gtctgtatct 


agagcgttga 


atcggtgatg 


cgatttgtta 


11460 


attaaattga 


tggtgtcacc 


attaccaggt 


ctagatatac 


caatggcaaa 


ctgagcacaa 


11520 


caataccagt 


ccggatcaac 


tggcaccatc 


tctcccgtag 


tctcatctaa 


tttttcttcc 


11580 


ggatgaggtt 


ccagatatac 


cgcaacacct 


ttattatggt 


ttccctgagg gaataataga 


11640 


atgtcccatt 


cgaaatcacc 


aattctaaac 


ctgggcgaat 


tgtatttegg gtttgttaac 


11700 


tcgttccagt 


caggaatgtt 


ccacgtgaag 


ctatcttcca 


gcaaagtctc 


cacttcttca 


11760 


tcaaattgtg 


gagaatactc 


ccaatgctct 


tatctatggg 


acttceggga 


aacacagtac 


11820 


cgatacttcc 


caattcgtct 


tcagagctca 


ttgtttgttt 


gaagagacta 


atcaaagaat 


11880 


cgttttctca 


aaaaaattaa 


tatcttaact 


gatagtttga 


tcaaaggggc 


aaaaegtagg 


11940 


ggcaaacaaa 


cggaaaaatc 


gtttctcaaa 


ttttctgatg 


ccaagaactc 


taaccagtct 


12000 


tatctaaaaa 


ttgccttatg 


atccgtctct 


ccggttacag 


cctgtgtaac 


tgattaatcc 


12060 


tgcctttcta 


atcaccattc 


taatgtttta 


attaagggat 


tttgtcttca 


ttaaeggett 


12120 


tcgctcataa 


aaatgttatg 


aegttttgee 


cgcaggcggg 


aaaccatcca 


cttcacgaga 


12180 


ctgatctcct 


ctgccggaac 


accgggcatc 


tccaacttat 


aagttggaga 


aataagagaa 


12240 


tttcagattg 


agagaatgaa 


aaaaaaaaac 


ccttagttca 


taggtccatt 


ctcttagcgc 


12300 


aactacagag 


aacaggggca 


caaacaggca 


aaaaaeggge 


acaacctcaa 


tggagtgatg 


12360 


caacctgcct 


ggagtaaatg 


atgacacaag gcaattgacc 


cacgcatgta 


tctatctcat 


12420 


tttcttacac 


cttctattac 


cttctgctct 


ctctgatttg 


gaaaaagctg 


aaaaaaaagg 


12480 


ttgaaaccag 


ttccctgaaa 


ttattcccct 


acttgactaa 


taagtatata 


aagaeggtag 


12540 


gtattgattg 


taattctgta 


aatctatttc 


ttaaacttct 


taaattctac 


ttttatagtt 


12600 


agtctttttt 


ttagttttaa 


aacaccaaga 


acttagtttc 


gaataaacac 


acataaacaa 


12660 


acaagcttac 


aaaacaaa atg get gca ' 
Met Ala Ala * 


tat gca get 
ryr Ala Ala 


cag ggc tat aag gtg 
Gin Gly Tyr Lys Val 


12711 



15 10 

eta gta etc aac ccc tct gtt get gca aca ctg ggc ttt ggt get tac 12 75 9 

Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr 
15 20 25 

atg tec aag get cat ggg ate gat cct aac ate agg ace ggg gtg aga 12 807 

Met Ser Lys Ala His Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg 
30 35 40 
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aca att acc act ggc age ccc ate acg tac tec acc tac ggc aag ttc 12855 

Thr lie Thr Thr Gly Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe 

45 50 55 

ctt gec gac ggc ggg tgc teg ggg ggc get tat gac ata ata att tgt 12903 

Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp He He He Cys 

60 65 70 75 

gac gag tgc cac tec acg gat gee aca tec ate ttg ggc att ggc act 12951 

Asp Glu Cys His Ser Thr Asp Ala Thr Ser He Leu Gly He Gly Thr 

80 85 90 

gtc ctt gac caa gca gag act gcg ggg gcg aga ctg gtt gtg etc gee 12 999 

Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala 

95 100 105 

acc gee acc cct ccg ggc tec gtc act gtg ccc cat ccc aac ate gag 13047 

Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn He Glu 

110 115 120 

gag gtt get ctg tec acc acc gga gag ate cct ttt tac ggc aag get 13 095 

Glu Val Ala Leu Ser Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala 

125 130 135 

ate ccc etc gaa gta ate aag ggg ggg aga cat etc ate ttc tgt cat 13143 

He Pro Leu Glu Val He Lys Gly Gly Arg His Leu He Phe Cys His 

140 145 150 155 

tea aag aag aag tgc gac gaa etc gee gca aag ctg gtc gca ttg ggc 13191 

Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly 

160 165 170 

ate aat gee gtg gee tac tac cgc ggt ctt gac gtg tec gtc ate ccg 132 3 9 

He Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro 

175 180 185 

acc age ggc gat gtt gtc gtc gtg gca acc gat gec etc atg acc ggc 132 87 

Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly 

190 195 200 

tat acc ggc gac ttc gac teg gtg ata gac tgc aat acg tgt gtc acc 13335 

Tyr Thr Gly Asp Phe Asp Ser Val He Asp Cys Asn Thr Cys Val Thr 

205 210 215 

cag aca gtc gat ttc age ctt gac cct acc ttc acc att gag aca ate 13383 

Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr He Glu Thr He 

220 225 230 235 

acg etc ccc caa gat get gtc tec cgc act caa cgt egg ggc agg act 13431 

Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr 

240 245 250 

ggc agg ggg aag cca ggc ate tac aga ttt gtg gca ccg ggg gag cgc 13479 

Gly Arg Gly Lys Pro Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg 

255 260 265 

ccc tec ggc atg ttc gac teg tec gtc etc tgt gag tgc tat gac gca 13527 
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Pro Ser Gly 
270 

ggc tgt get 
Gly Cys Ala 
285 

cga gcg tac 
Arg Ala Tyr 
300 

gaa ttt tgg 
Glu Phe Trp 



ttt eta tec 
Phe Leu Ser 



gcg tac caa 
Ala Tyr Gin 
350 

tgg gac cag 
Trp Asp Gin 
365 

ggg cca aca 
Gly Pro Thr 
380 

acc ctg acg 
Thr Leu Thr 



gac ctg gag 
Asp Leu Glu 



get get ttg 
Ala Ala Leu 
430 

ggc agg gtc 
Gly Arg Val 
445 

gtc etc tac 
Val Leu Tyr 
460 

ccg tac ate 
Pro Tyr lie 



gee etc ggc 
Ala Leu Gly 



Met Phe Asp 



tgg tat gag 
Trp Tyr Glu 



atg aac acc 
Met Asn Thr 
305 

gag ggc gtc 
Glu Gly Val 
320 

cag aca aag 
Gin Thr Lys 
335 

gee acc gtg 
Ala Thr Val 



atg tgg aag 
Met Trp Lys 



ccc ctg eta 
Pro Leu Leu 
385 

cac cca gtc 
His Pro Val 
400 

gtc gtc acg 
Val Val Thr 
415 

gec gcg tat 
Ala Ala Tyr 



gtc ttg tec 
Val Leu Ser 



cga gag ttc 
Arg Glu Phe 
465 

gag caa ggg 
Glu Gin Gly 
480 

etc ctg cag 
Leu Leu Gin 



Ser Ser Val 
275 

etc acg ccc 
Leu Thr Pro 
290 

cc 9 999 ctt 
Pro Gly Leu 



ttt aca ggc 
Phe Thr Gly 



cag agt ggg 
Gin Ser Gly 
340 

tgc get agg 
Cys Ala Arg 
355 

tgt ttg att 
Cys Leu lie 
370 

tac aga ctg 
Tyr Arg Leu 



acc aaa tac 
Thr Lys Tyr 



age acc tgg 
Ser Thr Trp 
420 

tgc ctg tea 
Cys Leu Ser 
435 

ggg aag ccg 
Gly Lys Pro 
450 

gat gag atg 
Asp Glu Met 



atg atg etc 
Met Met Leu 



acc gcg tec 
Thr Ala Ser 



Leu Cys Glu 



gee gag act 
Ala Glu Thr 
295 

ccc gtg tgc 
Pro Val Cys 
310 

etc act cat 
Leu Thr His 
325 

gag aac ctt 
Glu Asn Leu 



get caa gee 
Ala Gin Ala 



cgc etc aag 
Arg Leu Lys 
375 

ggc get gtt 
Gly Ala Val 
390 

ate atg aca 
He Met Thr 
405 

gtg etc gtt 
Val Leu Val 



aca ggc tgc 
Thr Gly Cys 



gca ate ata 
Ala He He 
455 

gaa gag tgc 
Glu Glu Cys 
470 

gec gag cag 
Ala Glu Gin 
485 

cgt cag gca 
Arg Gin Ala 



Cys Tyr Asp 
280 

aca gtt agg 
Thr Val Arg 



cag gac cat 
Gin Asp His 



ata gat gec 
He Asp Ala 
330 

cct tac ctg 
Pro Tyr Leu 
345 

cct ccc cca 
Pro Pro Pro 
360 

ccc acc etc 
Pro Thr Leu 



cag aat gaa 
Gin Asn Glu 



tgc atg teg 
Cys Met Ser 
410 

ggc ggc gtc 
Gly Gly Val 
425 

gtg gtc ata 
Val Val He 
440 

cct gac agg 
Pro Asp Arg 



tct cag cac 
Ser Gin His 



ttc aag cag 
Phe Lys Gin 
490 

gag gtt ate 
Glu Val He 



Ala 



eta 13575 
Leu 



ctt 13623 

Leu 

315 

cac 13671 
His 



gta 13719 
Val 



teg 13767 
Ser 



cat 13815 
His 



ate 13863 

He 

395 

gec 13911 
Ala 



ctg 13959 
Leu 



gtg 14007 
Val 



gaa 14055 
Glu 



tta 14103 

Leu 

475 

aag 14151 
Lys 



gee 14199 
Ala 
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495 500 505 

cct get gtc cag acc aac tgg caa aaa etc gag acc ttc tgg gcg aag 14247 

Pro Ala Val Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys 
510 515 520 

cat atg tgg aac ttc ate agt ggg ata caa tac ttg gcg ggc ttg tea 14295 

His Met Trp Asn Phe He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser 

525 530 535 

acg ctg cct ggt aac ccc gec att get tea ttg atg get ttt aca get 14343 

Thr Leu Pro Gly Asn Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala 
540 545 550 555 

get gtc acc age cca eta acc act age caa acc etc etc ttc aac ata 14391 

Ala Val Thr Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He 
560 565 570 

ttg ggg ggg tgg gtg get gec cag etc gec gee ccc ggt gec get act 1443 9 

Leu Gly Gly Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr 
575 580 585 

gec ttt gtg ggc get ggc tta get ggc gee gee ate ggc agt gtt gga 14487 

Ala Phe Val Gly Ala Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly 
590 595 600 

ctg ggg aag gtc etc ata gac ate ctt gca ggg tat ggc gcg ggc gtg 14535 

Leu Gly Lys Val Leu He Asp He Leu Ala Gly Tyr Gly Ala Gly Val 

605 610 615 

gcg gga get ctt gtg gca ttc aag ate atg age ggt gag gtc ccc tec 14583 

Ala Gly Ala Leu Val Ala Phe Lys He Met Ser Gly Glu Val Pro Ser 
620 625 630 635 

acg gag gac ctg gtc aat eta ctg ccc gec ate etc teg ccc gga gec 14 631 

Thr Glu Asp Leu Val Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala 
640 645 650 

etc gta gtc ggc gtg gtc tgt gca gca ata ctg cgc egg cac gtt ggc 14679 

Leu Val Val Gly Val Val Cys Ala Ala He Leu Arg Arg His Val Gly 
655 660 665 

ccg ggc gag ggg gca gtg cag tgg atg aac egg ctg ata gec ttc gee 14727 

Pro Gly Glu Gly Ala Val Gin Trp Met Asn Arg Leu He Ala Phe Ala 
670 675 680 

tec egg ggg aac cat gtt tec ccc acg cac tac gtg ccg gag age gat 14775 

Ser Arg Gly Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp 

685 690 695 

gca get gec cgc gtc act gee ata etc age age etc act gta acc cag 14823 

Ala Ala Ala Arg Val Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin 
700 705 710 715 

etc ctg agg cga ctg cac cag tgg ata age teg gag tgt acc act cca 14 871 

Leu Leu Arg Arg Leu His Gin Trp He Ser Ser Glu Cys Thr Thr Pro 
720 725 730 
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tgc tec ggt tec tgg eta agg gac ate tgg gac tgg ata tgc gag gtg 14919 

Cys Ser Gly Ser Trp Leu Arg Asp lie Trp Asp Trp lie Cys Glu Val 
735 740 745 

ttg age gac ttt aag ace tgg eta aaa get aag etc atg cca cag ctg 14967 

Leu Ser Asp Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu 

750 755 760 

cct ggg ate ccc ttt gtg tec tgc cag cgc ggg tat aag ggg gtc tgg 15 015 

Pro Gly lie Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp 

765 770 775 

c 9 a 999 g^c ggc ate atg cac act cgc tgc cac tgt gga get gag ate 15063 

Arg Gly Asp Gly He Met His Thr Arg Cys His Cys Gly Ala Glu He 
780 785 790 795 

act gga cat gtc aaa aac ggg acg atg agg ate gtc ggt cct agg ace 15111 

Thr Gly His Val Lys Asn Gly Thr Met Arg He Val Gly Pro Arg Thr 
800 805 810 

tgc agg aac atg tgg agt ggg ace ttc ccc att aat gee tac acc acg 1515 9 

Cys Arg Asn Met Trp Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr 
815 820 825 

ggc ccc tgt acc ccc ctt cct gcg ccg aac tac acg ttc gcg eta tgg 15207 

Gly Pro Cys Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp 

830 835 840 

a 99 gtg tct gca gag gaa tac gtg gag ata agg cag gtg ggg gac ttc 15255 

Arg Val Ser Ala Glu Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe 

845 850 855 

cac tac gtg acg ggt atg act act gac aat ctt aaa tgc ccg tgc cag 15303 

His Tyr Val Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin 
860 865 870 875 

gtc cca teg ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc eta cat 15351 

Val Pro Ser Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His 
880 885 890 

agg ttt gcg ccc ccc tgc aag ccc ttg ctg egg gag gag gta tea ttc 15399 

Arg Phe Ala Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe 
895 900 905 

aga gta gga etc cac gaa tac ccg gta ggg teg caa tta cct tgc gag 1544 7 

Arg Val Gly Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu 

910 915 920 

ccc gaa ccg gac gtg gee gtg ttg acg tec atg etc act gat ccc tec 154 95 

Pro Glu Pro Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser 

925 930 935 

cat ata aca gca gag gcg gee ggg cga agg ttg gcg agg gga tea ccc 15543 

His He Thr Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro 
940 945 950 955 

ccc tct gtg gee age tec teg get age cag eta tec get cca tct etc 15591 
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Pro Ser Val Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu 
960 965 970 

aag gca act tgc acc get aac cat gac tec cct gat get gag etc ata 1563 9 
Lys Ala Thr Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie 
975 980 985 

gag gee aac etc eta tgg agg cag gag atg ggc ggc aac ate acc agg 15687 
Glu Ala Asn Leu Leu Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg 
990 995 1000 

gtt gag tea gaa aac aaa gtg gtg att ctg gac tec ttc gat ccg ctt 15735 
Val Glu Ser Glu Asn Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu 
1005 1010 1015 

gtg gcg gag gag gac gag egg gag ate tec gta ccc gca gaa ate ctg 15783 
Val Ala Glu Glu Asp Glu Arg Glu lie Ser Val Pro Ala Glu lie Leu 
1020 1025 1030 1035 

egg aag tct egg aga ttc gee cag gee ctg ccc gtt tgg gcg egg ccg 15831 
Arg Lys Ser Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro 
1040 1045 1050 

gac tat aac ccc ccg eta gtg gag acg tgg aaa aag ccc gac tac gaa 15879 
Asp Tyr Asn Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu 
1055 1060 1065 

cca cct gtg gtc cat ggc tgc ccg ctt cca cct cca aag tec cct cct 15927 
Pro Pro Val Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro 
1070 1075 1080 

gtg cct ccg cct egg aag aag egg acg gtg gtc etc act gaa tea acc 15975 
Val Pro Pro Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr 
1085 1090 1095 

eta tct act gee ttg gec gag etc gee acc aga age ttt ggc age tec 16023 
Leu Ser Thr Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser 
1100 1105 1110 1115 

tea act tec ggc att acg ggc gac aat acg aca aca tec tct gag ccc 16071 
Ser Thr Ser Gly lie Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro 
1120 1125 1130 

gee cct tct ggc tgc ccc ccc gac tec gac get gag tec tat tec tec 16119 
Ala Pro Ser Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser 
1135 1140 1145 

atg ccc ccc ctg gag ggg gag cct ggg gat ccg gat ctt age gac ggg 16167 
Met Pro Pro Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly 
1150 1155 1160 

tea tgg tea acg gtc agt agt gag gee aac gcg gag gat gtc gtg tgc 16215 
Ser Trp Ser Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys 
1165 1170 H75 

tgc tea atg tct tac tct tgg aca ggc gca etc gtc acc ccg tgc gee 16263 
Cys Ser Met Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala 
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1180 1185 1190 1195 

gcg gaa gaa cag aaa ctg ccc ate aat gca eta age aac teg ttg eta 16311 
Ala Glu Glu Gin Lys Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu 
1200 1205 1210 

cgt cac cac aat ttg gtg tat tec ace ace tea cgc agt get tgc caa 16359 
Arg His His Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin 
1215 1220 1225 

agg cag aag aaa gtc aca ttt gac aga ctg caa gtt ctg gac age cat 164 07 
Arg Gin Lys Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His 
1230 1235 1240 

tac cag gac gta etc aag gag gtt aaa gca gcg gcg tea aaa gtg aag 16455 
Tyr Gin Asp Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys 
1245 1250 1255 

get aac ttg eta tec gta gag gaa get tgc age ctg acg ccc cca cac 16503 
Ala Asn Leu Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His 
1260 1265 1270 1275 

tea gec aaa tec aag ttt ggt tat ggg gca aaa gac gtc cgt tgc cat 16551 
Ser Ala Lys Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His 
1280 1285 1290 

gee aga aag gee gta acc cac ate aac tec gtg tgg aaa gac ctt ctg 16599 
Ala Arg Lys Ala Val Thr His lie Asn Ser Val Trp Lys Asp Leu Leu 
1295 1300 1305 

gaa gac aat gta aca cca ata gac act acc ate atg get aag aac gag 1664 7 
Glu Asp Asn Val Thr Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu 
1310 1315 1320 

gtt ttc tgc gtt cag cct gag aag ggg ggt cgt aag cca get cgt etc 16695 
Val Phe Cys Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu 
1325 1330 1335 

ate gtg ttc ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg get ttg 16743 
lie Val Phe Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu 
1340 1345 1350 1355 

tac gac gtg gtt aca aag etc ccc ttg gee gtg atg gga age tec tac 16791 
Tyr Asp Val Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr 
1360 1365 1370 

gga ttc caa tac tea cca gga cag egg gtt gaa ttc etc gtg caa gcg 16839 
Gly Phe Gin Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala 
1375 1380 1385 

tgg aag tec aag aaa acc cca atg ggg ttc teg tat gat acc cgc tgc 16887 
Trp Lys Ser Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys 
1390 1395 1400 

ttt gac tec aca gtc act gag age gac ate cgt acg gag gag gca ate 16935 
Phe Asp Ser Thr Val Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie 
1405 1410 1415 
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tac caa tgt tgt gac etc gac ccc caa gec cgc gtg gec ate aag tec 16983 
Tyr Gin Cys Cys Asp Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser 
1420 1425 1430 1435 

etc ace gag agg ctt tat gtt ggg ggc cct ctt ace aat tea agg ggg 17031 
Leu Thr Glu Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly 
1440 1445 1450 

gag aac tgc ggc tat cgc agg tgc cgc gcg age ggc gta ctg aca act 17079 
Glu Asn Cys Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr 
1455 1460 1465 

age tgt ggt aac acc etc act tgc tac ate aag gec egg gca gee tgt 1712 7 
Ser Cys Gly Asn Thr Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys 
1470 1475 1480 

cga gec gca ggg etc cag gac tgc acc atg etc gtg tgt ggc gac gac 1717 5 
Arg Ala Ala Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp 
1485 1490 1495 

tta gtc gtt ate tgt gaa age gcg ggg gtc cag gag gac gcg gcg age 1722 3 
Leu Val Val He Cys Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser 
1500 1505 1510 1515 

ctg aga gee ttc acg gag get atg acc agg tac tec gee ccc cct ggg 17271 
Leu Arg Ala Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly 
1520 1525 1530 

gac ccc cca caa cca gaa tac gac ttg gag etc ata aca tea tgc tec 17319 
Asp Pro Pro Gin Pro Glu Tyr Asp Leu Glu Leu He Thr Ser Cys Ser 
1535 1540 1545 

tec aac gtg tea gtc gee cac gac ggc get gga aag agg gtc tac tac 173 67 
Ser Asn Val Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr 
1550 1555 1560 

etc acc cgt gac cct aca acc ccc etc gcg aga get gcg tgg gag aca 17415 
Leu Thr Arg Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr 
1565 1570 1575 

gca aga cac act cca gtc aat tec tgg eta ggc aac ata ate atg ttt 174 63 
Ala Arg His Thr Pro Val Asn Ser Trp Leu Gly Asn He He Met Phe 
1580 1585 1590 1595 

gee ccc aca ctg tgg gcg agg atg ata ctg atg acc cat ttc ttt age 17511 
Ala Pro Thr Leu Trp Ala Arg Met He Leu Met Thr His Phe Phe Ser 
1600 1605 1610 

gtc ctt ata gee agg gac cag ctt gaa cag gee etc gat tgc gag ate 17559 
Val Leu He Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He 
1615 1620 1625 

tac ggg gec tgc tac tec ata gaa cca ctg gat eta cct cca ate att 17607 
Tyr Gly Ala Cys Tyr Ser He Glu Pro Leu Asp Leu Pro Pro He He 
1630 1635 1640 

caa aga etc cat ggc etc age gca ttt tea etc cac agt tac tct cca 17655 
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Gin Arg Leu His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro 
1645 1650 1655 

ggt gaa ate aat agg gtg gec gca tgc etc aga aaa ctt ggg gta ccg 17703 
Gly Glu lie Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro 
1660 1665 1670 1675 

ccc ttg cga get tgg aga cac egg gee egg age gtc cgc get agg ctt 17751 
Pro Leu Arg Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu 
1680 1685 1690 

ctg gee aga gga ggc agg get gec ata tgt ggc aag tac etc ttc aac 17799 
Leu Ala Arg Gly Gly Arg Ala Ala He Cys Gly Lys Tyr Leu Phe Asn 
1695 1700 1705 

tgg gca gta aga aca aag etc aaa etc act cca ata gcg gec get ggc 17847 
Trp Ala Val Arg Thr Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly 
1710 1715 1720 

cag ctg gac ttg tec ggc tgg ttc acg get ggc tac age ggg gga gac 17895 
Gin Leu Asp Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp 
1725 1730 1735 

att tat cac age gtg tct cat gee egg ccc cgc tgg ate tgg ttt tgc 17943 
He Tyr His Ser Val Ser His Ala Arg Pro Arg Trp He Trp Phe Cys 
1740 1745 1750 1755 

eta etc ctg ctt get gca ggg gta ggc ate tac etc etc ccc aac cga 17991 
Leu Leu Leu Leu Ala Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg 
1760 1765 1770 



atg age acg aat cct aaa cct caa aga aag acc aaa cgt aac acc aac 
Met Ser Thr Asn Pro Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn 
1775 1780 1785 
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egg egg ccg cag gac gtc aag ttc ccg ggt ggc ggt cag ate gtt ggt 180 87 
Arg Arg Pro Gin Asp Val Lys Phe Pro Gly Gly Gly Gin He Val Gly 
1790 1795 1800 

gga gtt tac ttg ttg ccg cgc agg ggc cct aga ttg ggt gtg cgc gcg 18135 
Gly Val Tyr Leu Leu Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala 
1805 1810 1815 

acg aga aag act tec gag egg teg caa cct cga ggt aga cgt cag cct 18183 
Thr Arg Lys Thr Ser Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro 
1820 1825 1830 1835 

ate ccc aag get cgt egg ccc gag ggc agg acc tgg get cag ccc ggg 18231 
He Pro Lys Ala Arg Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly 
1840 1845 1850 

tac cct tgg ccc etc tat ggc aat gag ggc tgc ggg tgg gcg gga tgg 18279 
Tyr Pro Trp Pro Leu Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp 
1855 1860 1865 

etc ctg tct ccc cgt ggc tct egg cct age tgg ggc ccc aca gac ccc 18327 
Leu Leu Ser Pro Arg Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro 



128 



1870 1875 1880 

egg cgt agg teg cgc aat ttg ggt aag gtc ate gat acc ctt acg tgc 18375 
Arg Arg Arg Ser Arg Asn Leu Gly Lys Val He Asp Thr Leu Thr Cys 
1885 1890 1895 

ggc ttc gee gac etc atg ggg tac ata ccg etc gtc ggc gee cct ctt 18423 
Gly Phe Ala Asp Leu Met Gly Tyr He Pro Leu Val Gly Ala Pro Leu 
1900 1905 1910 1915 

gga ggc get gee agg gee ctg gcg cat ggc gtc egg gtt ctg gaa gac 18471 
Gly Gly Ala Ala Arg Ala Leu Ala His Gly Val Arg Val Leu Glu Asp 
1920 1925 1930 

ggc gtg aac tat gca aca ggg aac ctt cct ggt tgc tct taatagtcga 1852 0 
Gly Val Asn Tyr Ala Thr Gly Asn Leu Pro Gly Cys Ser 
1935 1940 

ctttgttccc actgtacttt tagctegtae aaaatacaat atacttttca tttctccgta 18580 

aacaacatgt tttcccatgt aatatccttt tctatttttc gttccgttac caactttaca 18640 

catactttat atagctattc acttctatac actaaaaaac taagacaatt ttaattttgc 18700 

tgcctgccat atttcaattt gttataaatt cctataattt atcctattag tagctaaaaa 18760 

aagatgaatg tgaatcgaat cctaagagaa ttggatctga tccacaggac gggtgtggtc 18820 

gecatgateg cgtagtcgat agtggctcca agtagcgaag cgagcaggac tgggeggegg 18 880 

ecaaageggt cggacagtgc tccgagaacg ggtgcgcata gaaattgeat caaegcatat 18940 

agegctagea gcacgccata gtgactggcg atgctgtcgg aatggacgat atcccgcaag 19000 

aggcccggca gtaceggcat aaccaagcct atgcctacag catccagggt gacggtgccg 19060 

aggatgacga tgagegcatt gttagatttc atacaeggtg cctgactgcg ttagcaattt 1912 0 

aactgtgata aactaccgca ttaaagcttt ttctttccaa tttttttttt ttegtcatta 19180 

taaaaatcat tacgaccgag attccegggt aataactgat ataattaaat tgaagctcta 19240 

atttgtgagt ttagtataca tgcatttact tataatacag ttttttagtt ttgctggccg 19300 

catcttctca aatatgette ccagcctgct tttctgtaac gttcaccctc taccttagca 19360 

tcccttccct ttgeaaatag tcctcttcca acaataataa tgtcagatcc tgtagagacc 1942 0 

acatcatcca eggttctata ctgttgaccc aatgegtetc ccttgtcatc taaacccaca 19480 

ccgggtgtca taatcaacca ategtaaect tcatctcttc cacccatgtc tctttgagca 19540 

ataaagcega taacaaaatc tttgtcgctc ttcgcaatgt caacagtacc cttagtatat 19600 

tctccagtag atagggagee ettgeatgae aattctgeta acatcaaaag gectctaggt 19660 

tcctttgtta cttcttctgc cgcctgcttc aaaccgctaa caatacctgg gcccaccaca 19720 
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ccgtgtgcat tcgtaatgtc tgcccattct gctattctgt atacacccgc agagtactgc 197 80 
aatttgactg tattaccaat gtcagcaaat tttctgtctt cgaagagtaa aaaattgtac 19840 
ttggcggata atgcctttag cggcttaact gtgccctcca tggaaaaatc agtcaagata 19900 
tccacatgtg tttttagtaa acaaattttg ggacctaatg cttcaactaa ctccagtaat 19960 
tccttggtgg tacgaacatc caatgaagca cacaagtttg tttgcttttc gtgcatgata 2 0020 
ttaaatagct tggcagcaac aggactagga tgagtagcag cacgttcctt atatgtagct 20080 
ttcgacatga tttatcttcg tttcctgcag gtttttgttc tgtgcagttg ggttaagaat 20140 
actgggcaat ttcatgtttc ttcaacacta catatgcgta tatataccaa tctaagtctg 20200 
tgctccttcc ttcgttcttc cttctgttcg gagattaccg aatcaaaaaa atttcaagga 20260 
aaccgaaatc aaaaaaaaga ataaaaaaaa aatgatgaat tgaaaagctt atcgat 2 0316 



<210> 15 
<211> 1944 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd. delta. NS3NS5.pj .corel73 

<400> 15 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
15 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg Thr lie Thr Thr Gly 
35 40 45 

Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp lie lie lie Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser lie Leu Gly lie Gly Thr Val Leu Asp Gin Ala 
85 90 95 

Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn lie Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu lie Pro Phe Tyr Gly Lys Ala lie Pro Leu Glu Val 
130 135 140 
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lie Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 
145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly He Asn Ala Val Ala 
165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr He Glu Thr He Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 
245 250 255 

Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 
260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 

Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu He Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He Thr Leu Thr His Pro 
385 390 395 400 

Val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val Val Leu 



435 



440 



445 
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Ser Gly Lys Pro Ala He He Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 525 

He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He Leu Gly Gly Trp Val 
565 570 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 

His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp He Trp Asp Trp He Cys Glu Val Leu Ser Asp Phe Lys 



740 



745 



750 
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Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly lie Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly lie 
770 775 780 

Met His Thr Arg Cys His Cys Gly Ala Glu lie Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg lie Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro lie Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 

Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 

Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His He Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu He Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val He Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu He Ser Val Pro Ala Glu He Leu Arg Lys Ser Arg Arg 



Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 



025 



1030 



1035 



1040 



1045 



1050 



1055 
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Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly He 
105 1110 1115 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 

Leu Pro He Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 

Thr His He Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro He Asp Thr Thr He Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu He Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 
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Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp lie Arg Thr Glu Glu Ala He Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val He Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu He Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn He He Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 

Ala Arg Met He Leu Met Thr His Phe Phe Ser Val Leu He Ala Arg 
1605 1610 1615 

Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser He Glu Pro Leu Asp Leu Pro Pro He He Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu He Asn Arg 
1650 1655 1660 
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Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 1680 

Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 

Arg Ala Ala He Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp He Tyr His Ser Val 
1730 1735 1740 

Ser His Ala Arg Pro Arg Trp He Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 

Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg Met Ser Thr Asn Pro 
1765 1770 1775 

Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn Arg Arg Pro Gin Asp 
1780 1785 1790 

Val Lys Phe Pro Gly Gly Gly Gin He Val Gly Gly Val Tyr Leu Leu 
1795 1800 1805 

Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala Thr Arg Lys Thr Ser 
1810 1815 1820 

Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro He Pro Lys Ala Arg 
825 1830 1835 1840 

Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly Tyr Pro Trp Pro Leu 
1845 1850 1855 

Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp Leu Leu Ser Pro Arg 
1860 1865 1870 

Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro Arg Arg Arg Ser Arg 
1875 1880 1885 

Asn Leu Gly Lys Val He Asp Thr Leu Thr Cys Gly Phe Ala Asp Leu 
1890 1895 1900 

Met Gly Tyr He Pro Leu Val Gly Ala Pro Leu Gly Gly Ala Ala Arg 
905 1910 1915 1920 

Ala Leu Ala His Gly Val Arg Val Leu Glu Asp Gly Val Asn Tyr Ala 
1925 1930 1935 

Thr Gly Asn Leu Pro Gly Cys Ser 
1940 



<210> 16 
<211> 20217 
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<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd. delta. NS3NS5.pj .corel40 

<220> 
<221> CDS 

<222> (12679) . . (18411) 
<400> 16 

atcgatccta ccccttgcgc taaagaagta tatgtgccta ctaacgcttg tctttgtctc 60 
tgtcactaaa cactggatta ttactcccag atacttattt tggactaatt taaatgattt 12 0 
cggatcaacg ttcttaatat cgctgaatct tccacaattg atgaaagtag ctaggaagag 180 
gaattggtat aaagtttttg tttttgtaaa tctcgaagta tactcaaacg aatttagtat 240 
tttctcagtg atctcccaga tgctttcacc ctcacttaga agtgctttaa gcattttttt 300 
actgtggcta tttcccttat ctgcttcttc cgatgattcg aactgtaatt gcaaactact 360 
tacaatatca gtgatatcag attgatgttt ttgtccatag taaggaataa ttgtaaattc 42 0 
ccaagcagga atcaatttct ttaatgaggc ttccagaatt gttgcttttt gcgtcttgta 48 0 
tttaaactgg agtgatttat tgacaatatc gaaactcagc gaattgctta tgatagtatt 540 
atagctcatg aatgtggctc tcttgattgc tgttccgtta tgtgtaatca tccaacataa 600 
ataggttagt tcagcagcac ataatgctat tttctcacct gaaggtcttt caaacctttc 660 
cacaaactga cgaacaagca ccttaggtgg tgttttacat aatatatcaa attgtggcat 72 0 
gcttagcgcc gatcttgtgt gcaattgata tctagtttca actactctat ttatcttgta 780 
tcttgcagta ttcaaacacg ctaactcgaa aaactaactt taattgtcct gtttgtctcg 840 
cgttctttcg aaaaatgcac cggccgcgca ttatttgtac tgcgaaaata attggtactg 900 
cggtatcttc atttcatatt ttaaaaatgc acctttgctg cttttcctta atttttagac 960 
ggcccgcagg ttcgttttgc ggtactatct tgtgataaaa agttgttttg acatgtgatc 102 0 
tgcacagatt ttataatgta ataagcaaga atacattatc aaacgaacaa tactggtaaa 1080 
agaaaaccaa aatggacgac attgaaacag ccaagaatct gacggtaaaa gcacgtacag 1140 
cttatagcgt ctgggatgta tgtcggctgt ttattgaaat gattgctcct gatgtagata 1200 
ttgatataga gagtaaacgt aagtctgatg agctactctt tccaggatat gtcataaggc 12 60 
ccatggaatc tctcacaacc ggtaggccgt atggtcttga ttctagcgca gaagattcca 1320 
gcgtatcttc tgactccagt gctgaggtaa ttttgcctgc tgcgaagatg gttaaggaaa 13 80 
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ggtttgattc gattggaaat ggtatgctct 
atttgatgct acagaataac aagctgttag 
ctataataat aggaagattg cccgagaaag 
gaaaaatgga ttgtacacag ttattagtcc 
agctcgtaag cgtcgttacc caattgctta 
taataggtga tttattcatc ccggaatctc 
tggcggcaga gaatcgttta cagcaaaaaa 
accatgctaa tacaaatgaa gaagttccct 
caagaggagc atataaatta caaaacacca 
aaaaaaggag agtagcaacg agggtaaggg 
gatccaatat caaaggaaat gatagcattg 
cagcatatag aacagctaaa gggtagtgct 
gggataatat cacaggaggt actagactac 
gtacgcattt aagcataaac acgcactatg 
caacacgcag atataggtgc gacgtgaaca 
ttttcggaag cgctcgtttt cggaaacgct 
ctagaaagta taggaacttc agagcgcttt 
ttcaaaaaac caaaaacgca ccggactgta 
tccacaaaca ttgctcaaaa gtatctcttt 
aacctaccca tccacctttc gctccttgaa 
aggcttccaa tgctcttcaa attttactgt 
ctcttcataa tgtaagctta tctttatcga 
ctttacggtt ccctgagatt gaattagttc 
ctttgtacga cgaattttga ggttcgccat 
tattatctcc gcctcagttt gatcttccgc 
tatttcaccc cacaatcctt catccgcctc 
atgttgtaca ttgtttagtt cacgagaagg 
tatatgacct ttatcctgtt ctctttccac 
gcacctaata acattcttca aggcggagaa 




cttcacaaga agcaagtcag gctgccatag 1440 
acaatagaaa gcaactatac aaatctattg 1500 
acaagaagag agctaccgaa atgctcatga 1560 
caccagctcc aacggaagaa gatgttatga 162 0 
ctttagttcc accagatcgt caagctgctt 1680 
taaaggatat attcaatagt ttcaatgaac 174 0 
agagtgagtt ggaaggaagg actgaagtga 1800 
ccaggcgaac aagaagtaga gacacaaatg 1860 
tcactgaggg ccctaaagcg gttcccacga 192 0 
gcagaaaatc acgtaatact tctagggtat 198 0 
aaggatgaga ctaatccaat tgaggagtgg 2 04 0 
gaaggaagca tacgataccc cgcatggaat 2100 
ctttcatcct acataaatag acgcatataa 2160 
ccgttcttct catgtatata tatatacagg 2220 
gtgagctgta tgtgcgcagc tcgcgttgca 22 80 
ttgaagttcc tattccgaag ttcctattct 2340 
tgaaaaccaa aagcgctctg aagacgcact 24 0 0 
acgagctact aaaatattgc gaataccgct 2460 
gctatatatc tctgtgctat atccctatat 2520 
cttgcatcta aactcgacct ctacatcaac 2580 
caagtagacc catacggctg taatatgctg 2 640 
atcgtgtgaa aaactactac cgcgataaac 2700 
ctttagtata tgatacaaga cacttttgaa 2760 
cctctggcta tttccaatta tcctgtcggc 2820 
ttcagactgc catttttcac ataatgaatc 2880 
cgcatcttgt tccgttaaac tattgacttc 2940 
gtcctcttca ggcggtagct cctgatctcc 3000 
aaacttagaa atgtattcat gaattatgga 3 060 
gtttgggcca gatgcccaat atgcttgaca 312 0 
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tgaaaacgtg agaatgaatt tagtattatt 
tcgaagataa gagaagaatg cagtgacctt 
aaaaaatacg cctttaggcc ttctgatacc 
attaatatct aaaccctctc cgatggtggc 
aaactgtgat aattctgggt gatttatgat 
aggatcaggc caatccagtt ctttttcaat 
tccaacaaat gcaaatgcta acgttttgta 
cccccttgtc gtctcgatta cacacctact 
cataatacat tgcttaatac aagcaagcag 
cattacagct gatgtcattg tatatcagcg 
tcgcggtttt tataaacaaa actttcgtta 
ttggaaattc gggaaaaagt agagcaacgc 
ttaacttcga gaagggatta aggctaattt 
ccattgaatg ccttataaaa cagctataga 
tttgtcaaag cttactgatg atgatgtgtc 
tgacattata aagctggcac ttagaattcc 
tctactgtac gatacacttc cgctcaggtc 
ttgttactct attgatccag ctcagcaaag 
tgtagtaaaa ctagctagac cgagaaagag 
gctgccatca ttattatccg atgtgacgct 
tttttttttt tttttttttt ttttttggta 
agcaaggatt ttcttaactt cttcggcgac 
accacctaaa tcaccagttc tgatacctgc 
ggctttacct tcttcaggca agttcaatga 
agtggcgata gggttgacct tattctttgg 
gtacaaacca aatgcggtgt tcttgtctgg 
acccaaggag cctgggataa cggaggcttc 
ggtgattata ataccattta ggtgggttgg 
aatcaattga tgttgaactt tcaatgtagg 




gtgatattct gaggcaattt tattataatc 3180 
tgtattgaca aatggagatt ccatgtatct 3240 
ctttcccctg cggtttagcg tgccttttac 3300 
ctttaactga ctaataaatg caaccgatat 3360 
tcgatcgaca attgtattgt acactagtgc 342 0 
taccggtgtg tcgtctgtat tcagtacatg 3480 
tttcttataa ttgtcaggaa ctggaaaagt 3540 
ttcatcgtac accataggtt ggaagtgctg 3600 
tctctcgcca ttcatatttc agttattttc 3660 
ctgtaaaaat ctatctgtta cagaaggttt 3 72 0 
cgaaatcgag caatcacccc agctgcgtat 3780 
gagttgcatt ttttacacca taatgcatga 3 840 
cactagtatg tttcaaaaac ctcaatctgt 3900 
ttgcatagaa gagttagcta ctcaatgctt 3960 
tactttcagg cgggtctgta gtaaggagaa 4 02 0 
acggactata gactatacta gtatactccg 4080 
cttgtccttt aacgaggcct taccactctt 414 0 
gcagtgtgat ctaagattct atcttcgcga 4200 
actagaaatg caaaaggcac ttctacaatg 4260 
gcattttttt tttttttttt tttttttttt 4320 
caaatatcat aaaaaaagag aatcttttta 4380 
agcatcaccg acttcggtgg tactgttgga 4440 
atccaaaacc tttttaactg catcttcaat 4500 
caatttcaac atcattgcag cagacaagat 4560 
caaatctgga gcggaaccat ggcatggttc 462 0 
caaagaggcc aaggacgcag atggcaacaa 4 6 80 
atcggagatg atatcaccaa acatgttgct 4 74 0 
gttcttaact aggatcatgg cggcagaatc 4800 
gaattcgttc ttgatggttt cctccacagt 4860 
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ttttctccat aatcttgaag aggccaaaac attagcttta tccaaggacc aaataggcaa 4920 
tggtggctca tgttgtaggg ccatgaaagc ggccattctt gtgattcttt gcacttctgg 4980 
aacggtgtat tgttcactat cccaagcgac accatcacca tcgtcttcct ttctcttacc 5040 
aaagtaaata cctcccacta attctctaac aacaacgaag tcagtacctt tagcaaattg 5100 
tggcttgatt ggagataagt ctaaaagaga gtcggatgca aagttacatg gtcttaagtt 5160 
ggcgtacaat tgaagttctt tacggatttt tagtaaacct tgttcaggtc taacactacc 5220 
ggtaccccat ttaggaccac ccacagcacc taacaaaacg gcatcagcct tcttggaggc 5280 
ttccagcgcc tcatctggaa gtggaacacc tgtagcatcg atagcagcac caccaattaa 5340 
atgattttcg aaatcgaact tgacattgga acgaacatca gaaatagctt taagaacctt 54 00 
aatggcttcg gctgtgattt cttgaccaac gtggtcacct ggcaaaacga cgatcttctt 5460 
aggggcagac attacaatgg tatatccttg aaatatatat aaaaaaaaaa aaaaaaaaaa 552 0 
aaaaaaaaaa atgcagcttc tcaatgatat tcgaatacgc tttgaggaga tacagcctaa 5580 
tatccgacaa actgttttac agatttacga tcgtacttgt tacccatcat tgaattttga 5640 
acatccgaac ctgggagttt tccctgaaac agatagtata tttgaacctg tataataata 5700 
tatagtctag cgctttacgg aagacaatgt atgtatttcg gttcctggag aaactattgc 5760 
atctattgca taggtaatct tgcacgtcgc atccccggtt cattttctgc gtttccatct 582 0 
tgcacttcaa tagcatatct ttgttaacga agcatctgtg cttcattttg tagaacaaaa 5880 
atgcaacgcg agagcgctaa tttttcaaac aaagaatctg agctgcattt ttacagaaca 5 94 0 
gaaatgcaac gcgaaagcgc tattttacca acgaagaatc tgtgcttcat ttttgtaaaa 600 0 
caaaaatgca acgcgagagc gctaattttt caaacaaaga atctgagctg catttttaca 6060 
gaacagaaat gcaacgcgag agcgctattt taccaacaaa gaatctatac ttcttttttg 6120 
ttctacaaaa atgcatcccg agagcgctat ttttctaaca aagcatctta gattactttt 6180 
tttctccttt gtgcgctcta taatgcagtc tcttgataac tttttgcact gtaggtccgt 6240 
taaggttaga agaaggctac tttggtgtct attttctctt ccataaaaaa agcctgactc 63 00 
cacttcccgc gtttactgat tactagcgaa gctgcgggtg cattttttca agataaaggc 63 60 
atccccgatt atattctata ccgatgtgga ttgcgcatac tttgtgaaca gaaagtgata 642 0 
gcgttgatga ttcttcattg gtcagaaaat tatgaacggt ttcttctatt ttgtctctat 6480 
atactacgta taggaaatgt ttacattttc gtattgtttt cgattcactc tatgaatagt 6540 
tcttactaca atttttttgt ctaaagagta atactagaga taaacataaa aaatgtagag 6600 
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gtcgagttta gatgcaagtt caaggagcga 
agcacagaga tatatagcaa agagatactt 
aatattttag tagctcgtta cagtccggtg 
gagcgctttt ggttttcaaa agcgctctga 
tcggaatagg aacttcaaag cgtttccgaa 
tgcgcacata cagctcactg ttcacgtcgc 
tatacatgag aagaacggca tagtgcgtgt 
atttatgtag gatgaaaggt agtctagtac 
gtatcgtatg cttccttcag cactaccctt 
tggattagtc tcatccttca atgctatcat 
ccgagaaact agtgcgaagt agtgatcagg 
cctggccacg gcagaagcac gcttatcgct 
taggcccttc attgaaagaa atgaggtcat 
attttttata gcaaagattg aataaggcgc 
gactaagtta tcttttaata attggtattc 
atttactcgt tttaggactg gttcagaatt 
atcgatgata agctgtcaaa catgagaatt 
tatttttata ggttaatgtc atgataataa 
ggggaaatgt gcgcggaacc cctatttgtt 
cgctcatgag acaataaccc tgataaatgc 
gtattcaaca tttccgtgtc gcccttattc 
ttgctcaccc agaaacgctg gtgaaagtaa 
tgggttacat cgaactggat ctcaacagcg 
aacgttttcc aatgatgagc acttttaaag 
ttgacgccgg gcaagagcaa ctcggtcgcc 
agtactcacc agtcacagaa aagcatctta 
gtgctgccat aaccatgagt gataacactg 
gaccgaagga gctaaccgct tttttgcaca 
gttgggaacc ggagctgaat gaagccatac 




aaggtggatg ggtaggttat atagggatat 6660 
ttgagcaatg tttgtggaag cggtattcgc 672 0 
cgtttttggt tttttgaaag tgcgtcttca 6780 
agttcctata ctttctagag aataggaact 6840 
aacgagcgct tccgaaaatg caacgcgagc 6900 
acctatatct gcgtgttgcc tgtatatata 6 960 
ttatgcttaa atgcgtactt atatgcgtct 7020 
ctcctgtgat attatcccat tccatgcggg 70 80 
tagctgttct atatgctgcc actcctcaat 7140 
ttcctttgat attggatcat atgcatagta 7200 
tattgctgtt atctgatgag tatacgttgt 7260 
ccaatttccc acaacattag tcaactccgt 732 0 
caaatgtctt ccaatgtgag attttgggcc 73 80 
atttttcttc aaagctttat tgtacgatct 744 0 
ctgtttattg cttgaagaat tgccggtcct 7500 
cctcaaaaat tcatccaaat atacaagtgg 7560 
cttgaagacg aaagggcctc gtgatacgcc 762 0 
tggtttctta gacgtcaggt ggcacttttc 7680 
tatttttcta aatacattca aatatgtatc 7740 
ttcaataata ttgaaaaagg aagagtatga 7800 
ccttttttgc ggcattttgc cttcctgttt 7860 
aagatgctga agatcagttg ggtgcacgag 792 0 
gtaagatcct tgagagtttt cgccccgaag 7 980 
ttctgctatg tggcgcggta ttatcccgtg 8040 
gcatacacta ttctcagaat gacttggttg 8100 
cggatggcat gacagtaaga gaattatgca 8160 
cggccaactt acttctgaca acgatcggag 8220 
acatggggga tcatgtaact cgccttgatc 82 80 
caaacgacga gcgtgacacc acgatgcctg 8340 
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cagcaatggc aacaacgttg cgcaaactat taactggcga actacttact ctagcttccc 8400 
ggcaacaatt aatagactgg atggaggcgg ataaagttgc aggaccactt ctgcgctcgg 84 60 
cccttccggc tggctggttt attgctgata aatctggagc cggtgagcgt gggtctcgcg 8520 
gtatcattgc agcactgggg ccagatggta agccctcccg tatcgtagtt atctacacga 8580 
cggggagtca ggcaactatg gatgaacgaa atagacagat cgctgagata ggtgcctcac 8640 
tgattaagca ttggtaactg tcagaccaag tttactcata tatactttag attgatttaa 8700 
aacttcattt ttaatttaaa aggatctagg tgaagatcct ttttgataat ctcatgacca 8760 
aaatccctta acgtgagttt tcgttccact gagcgtcaga ccccgtagaa aagatcaaag 882 0 
gatcttcttg agatcctttt tttctgcgcg taatctgctg cttgcaaaca aaaaaaccac 8880 
cgctaccagc ggtggtttgt ttgccggatc aagagctacc aactcttttt ccgaaggtaa 8940 
ctggcttcag cagagcgcag ataccaaata ctgtccttct agtgtagccg tagttaggcc 9000 
accacttcaa gaactctgta gcaccgccta catacctcgc tctgctaatc ctgttaccag 9060 
tggctgctgc cagtggcgat aagtcgtgtc ttaccgggtt ggactcaaga cgatagttac 912 0 
cggataaggc gcagcggtcg ggctgaacgg ggggttcgtg cacacagccc agcttggagc 9180 
gaacgaccta caccgaactg agatacctac agcgtgagct atgagaaagc gccacgcttc 9240 
ccgaagggag aaaggcggac aggtatccgg taagcggcag ggtcggaaca ggagagcgca 93 0 0 
cgagggagct tccaggggga aacgcctggt atctttatag tcctgtcggg tttcgccacc 9360 
tctgacttga gcgtcgattt ttgtgatgct cgtcaggggg gcggagccta tggaaaaacg 942 0 
ccagcaacgc ggccttttta cggttcctgg ccttttgctg gccttttgct cacatgttct 9480 
ttcctgcgtt atcccctgat tctgtggata accgtattac cgcctttgag tgagctgata 9540 
ccgctcgccg cagccgaacg accgagcgca gcgagtcagt gagcgaggaa gcggaagagc 9600 
gcctgatgcg gtattttctc cttacgcatc tgtgcggtat ttcacaccgc atatggtgca 9660 
ctctcagtac aatctgctct gatgccgcat agttaagcca gtatacactc cgctatcgct 972 0 
acgtgactgg gtcatggctg cgccccgaca cccgccaaca cccgctgacg cgccctgacg 9780 
ggcttgtctg ctcccggcat ccgcttacag acaagctgtg accgtctccg ggagctgcat 9840 
gtgtcagagg ttttcaccgt catcaccgaa acgcgcgagg cagctgcggt aaagctcatc 9900 
agcgtggtcg tgaagcgatt cacagatgtc tgcctgttca tccgcgtcca gctcgttgag 9960 
tttctccaga agcgttaatg tctggcttct gataaagcgg gccatgttaa gggcggtttt 1002 0 
ttcctgtttg gtcactgatg cctccgtgta agggggattt ctgttcatgg gggtaatgat 10080 
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accgatgaaa cgagagagga tgctcacgat acgggttact gatgatgaac atgcccggtt 10140 
actggaacgt tgtgagggta aacaactggc ggtatggatg cggcgggacc agagaaaaat 102 00 
cactcagggt caatgccagc gcttcgttaa tacagatgta ggtgttccac agggtagcca 102 60 
gcagcatcct gcgatgcaga tccggaacat aatggtgcag ggcgctgact tccgcgtttc 10320 
cagactttac gaaacacgga aaccgaagac cattcatgtt gttgctcagg tcgcagacgt 103 80 
tttgcagcag cagtcgcttc acgttcgctc gcgtatcggt gattcattct gctaaccagt 10440 
aaggcaaccc cgccagccta gccgggtcct caacgacagg agcacgatca tgcgcacccg 10500 
tggccaggac ccaacgctgc ccgagatgcg ccgcgtgcgg ctgctggaga tggcggacgc 10560 
gatggatatg ttctgccaag ggttggtttg cgcattcaca gttctccgca agaattgatt 1062 0 
ggctccaatt cttggagtgg tgaatccgtt agcgaggtgc cgccggcttc cattcaggtc 10680 
gaggtggccc ggctccatgc accgcgacgc aacgcgggga ggcagacaag gtatagggcg 10740 
gcgcctacaa tccatgccaa cccgttccat gtgctcgccg aggcggcata aatcgccgtg 10800 
acgatcagcg gtccaatgat cgaagttagg ctggtaagag ccgcgagcga tccttgaagc 10860 
tgtccctgat ggtcgtcatc tacctgcctg gacagcatgg cctgcaacgc gggcatcccg 10 92 0 
atgccgccgg aagcgagaag aatcataatg gggaaggcca tccagcctcg cgtcgcgaac 10980 
gccagcaaga cgtagcccag cgcgtcggcc gccatgccgg cgataatggc ctgcttctcg 11040 
ccgaaacgtt tggtggcggg accagtgacg aaggcttgag cgagggcgtg caagattccg 11100 
aataccgcaa gcgacaggcc gatcatcgtc gcgctccagc gaaagcggtc ctcgccgaaa 11160 
atgacccaga gcgctgccgg cacctgtcct acgagttgca tgataaagaa gacagtcata 1122 0 
agtgcggcga cgatagtcat gccccgcgcc caccggaagg agctgactgg gttgaaggct 112 80 
ctcaagggca tcggtcgagg atccttcaat atgcgcacat acgctgttat gttcaaggtc 1134 0 
ccttcgttta agaacgaaag cggtcttcct tttgagggat gtttcaagtt gttcaaatct 11400 
atcaaatttg caaatcccca gtctgtatct agagcgttga atcggtgatg cgatttgtta 11460 
attaaattga tggtgtcacc attaccaggt ctagatatac caatggcaaa ctgagcacaa 1152 0 
caataccagt ccggatcaac tggcaccatc tctcccgtag tctcatctaa tttttcttcc 11580 
ggatgaggtt ccagatatac cgcaacacct ttattatggt ttccctgagg gaataataga 1164 0 
atgtcccatt cgaaatcacc aattctaaac ctgggcgaat tgtatttcgg gtttgttaac 11700 
tcgttccagt caggaatgtt ccacgtgaag ctatcttcca gcaaagtctc cacttcttca 11760 
tcaaattgtg gagaatactc ccaatgctct tatctatggg acttccggga aacacagtac 1182 0 
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cgatacttcc caattcgtct tcagagctca ttgtttgttt gaagagacta atcaaagaat 11880 

cgttttctca aaaaaattaa tatcttaact gatagtttga tcaaaggggc aaaacgtagg 11940 

ggcaaacaaa cggaaaaatc gtttctcaaa ttttctgatg ccaagaactc taaccagtct 12000 

tatctaaaaa ttgccttatg atccgtctct ccggttacag cctgtgtaac tgattaatcc 12060 

tgcctttcta atcaccattc taatgtttta attaagggat tttgtcttca ttaacggctt 12120 

tcgctcataa aaatgttatg acgttttgcc cgcaggcggg aaaccatcca cttcacgaga 12180 

ctgatctcct ctgccggaac accgggcatc tccaacttat aagttggaga aataagagaa 1224 0 

tttcagattg agagaatgaa aaaaaaaaac ccttagttca taggtccatt ctcttagcgc 12300 

aactacagag aacaggggca caaacaggca aaaaacgggc acaacctcaa tggagtgatg 12360 

caacctgcct ggagtaaatg atgacacaag gcaattgacc cacgcatgta tctatctcat 1242 0 

tttcttacac cttctattac cttctgctct ctctgatttg gaaaaagctg aaaaaaaagg 124 8 0 

ttgaaaccag ttccctgaaa ttattcccct acttgactaa taagtatata aagacggtag 12540 

gtattgattg taattctgta aatctatttc ttaaacttct taaattctac ttttatagtt 12600 

agtctttttt ttagttttaa aacaccaaga acttagtttc gaataaacac acataaacaa 12 66 0 

acaagcttac aaaacaaa atg get gca tat gca get cag ggc tat aag gtg 12711 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val 
15 10 

eta gta etc aac ccc tct gtt get gca aca ctg ggc ttt ggt get tac 12759 
Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr 
15 20 25 

atg tec aag get cat ggg ate gat cct aac ate agg acc ggg gtg aga 12 807 
Met Ser Lys Ala His Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg 
30 35 40 

aca att acc act ggc age ccc ate acg tac tec acc tac ggc aag ttc 12855 
Thr lie Thr Thr Gly Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe 
45 50 55 

ctt gec gac ggc ggg tgc teg ggg ggc get tat gac ata ata att tgt 12903 
Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp lie lie lie Cys 
60 65 70 75 

gac gag tgc cac tec acg gat gee aca tec ate ttg ggc att ggc act 12 951 
Asp Glu Cys His Ser Thr Asp Ala Thr Ser lie Leu Gly lie Gly Thr 
80 85 90 

gtc ctt gac caa gca gag act gcg ggg gcg aga ctg gtt gtg etc gee 12 999 
Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala 
95 100 105 

acc gec acc cct ccg ggc tec gtc act gtg ccc cat ccc aac ate gag 13047 
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Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn lie Glu 
110 115 120 

gag gtt get ctg tec acc acc gga gag ate cct ttt tac ggc aag get 13095 
Glu Val Ala Leu Ser Thr Thr Gly Glu lie Pro Phe Tyr Gly Lys Ala 
125 130 135 

ate ccc etc gaa gta ate aag ggg ggg aga cat etc ate ttc tgt cat 13143 
lie Pro Leu Glu Val lie Lys Gly Gly Arg His Leu lie Phe Cys His 
140 145 150 155 

tea aag aag aag tgc gac gaa etc gee gca aag ctg gtc gca ttg ggc 13191 
Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly 
160 165 170 

ate aat gee gtg gee tac tac cgc ggt ctt gac gtg tec gtc ate ccg 13239 
lie Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro 
175 180 185 

acc age ggc gat gtt gtc gtc gtg gca acc gat gee etc atg acc ggc 13287 
Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly 
190 195 200 

tat acc ggc gac ttc gac teg gtg ata gac tgc aat acg tgt gtc acc 133 3 5 
Tyr Thr Gly Asp Phe Asp Ser Val He Asp Cys Asn Thr Cys Val Thr 
205 210 215 

cag aca gtc gat ttc age ctt gac cct acc ttc acc att gag aca ate 13383 
Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr He Glu Thr He 
220 225 230 235 

acg etc ccc caa gat get gtc tec cgc act caa cgt egg ggc agg act 134 31 
Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr 
240 245 250 

ggc agg ggg aag cca ggc ate tac aga ttt gtg gca ccg ggg gag cgc 13479 
Gly Arg Gly Lys Pro Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg 
255 260 265 

ccc tec ggc atg ttc gac teg tec gtc etc tgt gag tgc tat gac gca 13527 
Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala 
270 275 280 

ggc tgt get tgg tat gag etc acg ccc gee gag act aca gtt agg eta 13575 
Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu 
285 290 295 

cga gcg tac atg aac acc ccg ggg ctt ccc gtg tgc cag gac cat ctt 13623 
Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu 
300 305 310 315 

gaa ttt tgg gag ggc gtc ttt aca ggc etc act cat ata gat gee cac 13671 
Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His He Asp Ala His 
320 325 330 

ttt eta tec cag aca aag cag agt ggg gag aac ctt cct tac ctg gta 13719 
Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val 
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335 340 345 

gcg tac caa gcc acc gtg tgc get agg get caa gec cct ccc cca teg 13 767 
Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser 
350 355 360 

tgg gac cag atg tgg aag tgt ttg att cgc etc aag ccc acc etc cat 13815 
Trp Asp Gin Met Trp Lys Cys Leu lie Arg Leu Lys Pro Thr Leu His 
365 370 375 

ggg cca aca ccc ctg eta tac aga ctg ggc get gtt cag aat gaa ate 13 863 
Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu lie 
380 385 390 395 

acc ctg acg cac cca gtc acc aaa tac ate atg aca tgc atg teg gcc 13911 
Thr Leu Thr His Pro Val Thr Lys Tyr lie Met Thr Cys Met Ser Ala 
400 405 410 

gac ctg gag gtc gtc acg age acc tgg gtg etc gtt ggc ggc gtc ctg 13 95 9 
Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu 
415 420 425 

get get ttg gcc gcg tat tgc ctg tea aca ggc tgc gtg gtc ata gtg 14007 
Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val He Val 
430 435 440 

ggc agg gtc gtc ttg tec ggg aag ccg gca ate ata cct gac agg gaa 14055 
Gly Arg Val Val Leu Ser Gly Lys Pro Ala He He Pro Asp Arg Glu 
445 450 455 

gtc etc tac cga gag ttc gat gag atg gaa gag tgc tct cag cac tta 14103 
Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu 
460 465 470 475 

ccg tac ate gag caa ggg atg atg etc gcc gag cag ttc aag cag aag 14151 
Pro Tyr He Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys 
480 485 490 

gcc etc ggc etc ctg cag acc gcg tec cgt cag gca gag gtt ate gcc 14199 
Ala Leu Gly Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala 
495 500 505 

cct get gtc cag acc aac tgg caa aaa etc gag acc ttc tgg gcg aag 14247 
Pro Ala Val Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys 
510 515 520 

cat atg tgg aac ttc ate agt ggg ata caa tac ttg gcg ggc ttg tea 142 95 
His Met Trp Asn Phe He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser 
525 530 535 

acg ctg cct ggt aac ccc gcc att get tea ttg atg get ttt aca get 14343 
Thr Leu Pro Gly Asn Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala 
540 545 550 555 

get gtc acc age cca eta acc act age caa acc etc etc ttc aac ata 14391 
Ala Val Thr Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He 
560 565 570 
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ttg ggg ggg tgg gtg get gec cag etc gec gec ccc ggt gec get act 14439 
Leu Gly Gly Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr 
575 580 585 

gec ttt gtg ggc get ggc tta get ggc gee gee ate ggc agt gtt gga 14487 
Ala Phe Val Gly Ala Gly Leu Ala Gly Ala Ala lie Gly Ser Val Gly 
590 595 600 

ctg ggg aag gtc etc ata gac ate ctt gca ggg tat ggc gcg ggc gtg 14535 
Leu Gly Lys Val Leu He Asp He Leu Ala Gly Tyr Gly Ala Gly Val 
605 610 615 

gcg gga get ctt gtg gca ttc aag ate atg age ggt gag gtc ccc tec 14583 
Ala Gly Ala Leu Val Ala Phe Lys He Met Ser Gly Glu Val Pro Ser 
620 625 630 635 

acg gag gac ctg gtc aat eta ctg ccc gee ate etc teg ccc gga gee 14631 
Thr Glu Asp Leu Val Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala 
640 645 650 

etc gta gtc ggc gtg gtc tgt gca gca ata ctg cgc egg cac gtt ggc 14679 
Leu Val Val Gly Val Val Cys Ala Ala He Leu Arg Arg His Val Gly 
655 660 665 

ccg ggc gag ggg gca gtg cag tgg atg aac egg ctg ata gee ttc gee 14727 
Pro Gly Glu Gly Ala Val Gin Trp Met Asn Arg Leu He Ala Phe Ala 
670 675 680 

tec egg ggg aac cat gtt tec ccc acg cac tac gtg ccg gag age gat 14775 
Ser Arg Gly Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp 
685 690 695 

gca get gee cgc gtc act gec ata etc age age etc act gta ace cag 14823 
Ala Ala Ala Arg Val Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin 
700 705 710 715 

etc ctg agg cga ctg cac cag tgg ata age teg gag tgt ace act cca 14871 
Leu Leu Arg Arg Leu His Gin Trp He Ser Ser Glu Cys Thr Thr Pro 
720 725 730 

tgc tec ggt tec tgg eta agg gac ate tgg gac tgg ata tgc gag gtg 14919 
Cys Ser Gly Ser Trp Leu Arg Asp He Trp Asp Trp He Cys Glu Val 
735 740 745 

ttg age gac ttt aag ace tgg eta aaa get aag etc atg cca cag ctg 14967 
Leu Ser Asp Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu 
750 755 760 

cct ggg ate ccc ttt gtg tec tgc cag cgc ggg tat aag ggg gtc tgg 15015 
Pro Gly He Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp 
765 770 775 

cga ggg gac ggc ate atg cac act cgc tgc cac tgt gga get gag ate 150 63 
Arg Gly Asp Gly He Met His Thr Arg Cys His Cys Gly Ala Glu He 
780 785 790 795 

act gga cat gtc aaa aac ggg acg atg agg ate gtc ggt cct agg ace 15111 
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Thr Gly His Val Lys Asn Gly Thr Met Arg He Val Gly Pro Arg Thr 
800 805 810 

tgc agg aac atg tgg agt ggg acc ttc ccc att aat gcc tac acc acg 15159 
Cys Arg Asn Met Trp Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr 
815 820 825 

ggc ccc tgt acc ccc ctt cct gcg ccg aac tac acg ttc gcg eta tgg 15207 
Gly Pro Cys Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp 
830 835 840 

agg gtg tct gca gag gaa tac gtg gag ata agg cag gtg ggg gac ttc 15255 
Arg Val Ser Ala Glu Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe 
845 850 855 

cac tac gtg acg ggt atg act act gac aat ctt aaa tgc ccg tgc cag 153 03 
His Tyr Val Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin 
860 865 870 875 

gtc cca teg ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc eta cat 15351 
Val Pro Ser Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His 
880 885 890 

agg ttt gcg ccc ccc tgc aag ccc ttg ctg egg gag gag gta tea ttc 153 99 
Arg Phe Ala Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe 
895 900 905 

aga gta gga etc cac gaa tac ccg gta ggg teg caa tta cct tgc gag 15447 
Arg Val Gly Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu 
910 915 920 

ccc gaa ccg gac gtg gcc gtg ttg acg tec atg etc act gat ccc tec 15495 
Pro Glu Pro Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser 
925 930 935 

cat ata aca gca gag gcg gcc ggg cga agg ttg gcg agg gga tea ccc 15543 
His He Thr Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro 
940 945 950 955 

ccc tct gtg gcc age tec teg get age cag eta tec get cca tct etc 15591 
Pro Ser Val Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu 
960 965 970 

aag gca act tgc acc get aac cat gac tec cct gat get gag etc ata 15639 
Lys Ala Thr Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu Leu He 
975 980 985 

gag gcc aac etc eta tgg agg cag gag atg ggc ggc aac ate acc agg 15687 
Glu Ala Asn Leu Leu Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg 
990 995 1000 

gtt gag tea gaa aac aaa gtg gtg att ctg gac tec ttc gat ccg ctt 15735 
Val Glu Ser Glu Asn Lys Val Val He Leu Asp Ser Phe Asp Pro Leu 
1005 1010 1015 

gtg gcg gag gag gac gag egg gag ate tec gta ccc gca gaa ate ctg 15783 
Val Ala Glu Glu Asp Glu Arg Glu He Ser Val Pro Ala Glu He Leu 
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1020 1025 1030 1035 

egg aag tct egg aga ttc gec cag gec ctg ccc gtt tgg gcg egg ccg 15 831 
Arg Lys Ser Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro 
1040 1045 1050 

gac tat aac ccc ccg eta gtg gag acg tgg aaa aag ccc gac tac gaa 15879 
Asp Tyr Asn Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu 
1055 1060 1065 

cca cct gtg gtc cat ggc tgc ccg ctt cca cct cca aag tec cct cct 15927 
Pro Pro Val Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro 
1070 1075 1080 

gtg cct ccg cct egg aag aag egg acg gtg gtc etc act gaa tea acc 15975 
Val Pro Pro Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr 
1085 1090 1095 

eta tct act gec ttg gee gag etc gec acc aga age ttt ggc age tec 16023 
Leu Ser Thr Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser 
1100 1105 1110 1115 

tea act tec ggc att acg ggc gac aat acg aca aca tec tct gag ccc 16071 
Ser Thr Ser Gly lie Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro 
1120 1125 1130 

gec cct tct ggc tgc ccc ccc gac tec gac get gag tec tat tec tec 16119 
Ala Pro Ser Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser 
1135 1140 1145 

atg ccc ccc ctg gag ggg gag cct ggg gat ccg gat ctt age gac ggg 16167 
Met Pro Pro Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly 
1150 1155 1160 

tea tgg tea acg gtc agt agt gag gee aac gcg gag gat gtc gtg tgc 16215 
Ser Trp Ser Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys 
1165 1170 1175 

tgc tea atg tct tac tct tgg aca ggc gca etc gtc acc ccg tgc gee 16263 
Cys Ser Met Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala 
1180 1185 1190 1195 

gcg gaa gaa cag aaa ctg ccc ate aat gca eta age aac teg ttg eta 16311 
Ala Glu Glu Gin Lys Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu 
1200 1205 1210 

cgt cac cac aat ttg gtg tat tec acc acc tea cgc agt get tgc caa 16359 
Arg His His Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin 
1215 1220 1225 

agg cag aag aaa gtc aca ttt gac aga ctg caa gtt ctg gac age cat 164 07 
Arg Gin Lys Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His 
1230 1235 1240 

tac cag gac gta etc aag gag gtt aaa gca gcg gcg tea aaa gtg aag 16455 
Tyr Gin Asp Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys 
1245 1250 1255 
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get aac ttg eta tec gta gag gaa get tgc age ctg acg ccc cca cac 16503 
Ala Asn Leu Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His 
1260 1265 1270 1275 

tea gee aaa tec aag ttt ggt tat ggg gca aaa gac gtc cgt tgc cat 16551 
Ser Ala Lys Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His 
1280 1285 1290 

gec aga aag gee gta acc cac ate aac tec gtg tgg aaa gac ctt ctg 16599 
Ala Arg Lys Ala Val Thr His He Asn Ser Val Trp Lys Asp Leu Leu 
1295 1300 1305 

gaa gac aat gta aca cca ata gac act acc ate atg get aag aac gag 16647 
Glu Asp Asn Val Thr Pro He Asp Thr Thr He Met Ala Lys Asn Glu 
1310 1315 1320 

gtt ttc tgc gtt cag cct gag aag ggg ggt cgt aag cca get cgt etc 16695 
Val Phe Cys Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu 
1325 1330 1335 

ate gtg ttc ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg get ttg 16743 
He Val Phe Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu 
1340 1345 1350 1355 

tac gac gtg gtt aca aag etc ccc ttg gec gtg atg gga age tec tac 16791 
Tyr Asp Val Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr 
1360 1365 1370 

gga ttc caa tac tea cca gga cag egg gtt gaa ttc etc gtg caa gcg 16839 
Gly Phe Gin Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala 
1375 1380 1385 

tgg aag tec aag aaa acc cca atg ggg ttc teg tat gat acc cgc tgc 16887 
Trp Lys Ser Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys 
1390 1395 1400 

ttt gac tec aca gtc act gag age gac ate cgt acg gag gag gca ate 16935 
Phe Asp Ser Thr Val Thr Glu Ser Asp He Arg Thr Glu Glu Ala He 
1405 1410 1415 

tac caa tgt tgt gac etc gac ccc caa gec cgc gtg gec ate aag tec 16983 
Tyr Gin Cys Cys Asp Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser 
1420 1425 1430 1435 

etc acc gag agg ctt tat gtt ggg ggc cct ctt acc aat tea agg ggg 17031 
Leu Thr Glu Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly 
1440 1445 1450 

gag aac tgc ggc tat cgc agg tgc cgc gcg age ggc gta ctg aca act 17079 
Glu Asn Cys Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr 
1455 1460 1465 

age tgt ggt aac acc etc act tgc tac ate aag gec egg gca gec tgt 1712 7 
Ser Cys Gly Asn Thr Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys 
1470 1475 1480 

cga gec gca ggg etc cag gac tgc acc atg etc gtg tgt ggc gac gac 17175 
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Arg Ala Ala Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp 
1485 1490 1495 

tta gtc gtt ate tgt gaa age gcg ggg gtc cag gag gac gcg gcg age 17223 
Leu Val Val lie Cys Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser 
1500 1505 1510 1515 

ctg aga gec ttc acg gag get atg ace agg tac tec gee ccc cct ggg 172 71 
Leu Arg Ala Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly 
1520 1525 1530 

gac ccc cca caa cca gaa tac gac ttg gag etc ata aca tea tgc tec 17319 
Asp Pro Pro Gin Pro Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser 
1535 1540 1545 

tec aac gtg tea gtc gee cac gac ggc get gga aag agg gtc tac tac 17367 
Ser Asn Val Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr 
1550 1555 1560 

etc ace cgt gac cct aca acc ccc etc gcg aga get gcg tgg gag aca 17415 
Leu Thr Arg Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr 
1565 1570 1575 

gca aga cac act cca gtc aat tec tgg eta ggc aac ata ate atg ttt 17463 
Ala Arg His Thr Pro Val Asn Ser Trp Leu Gly Asn lie lie Met Phe 
1580 1585 1590 1595 

gec ccc aca ctg tgg gcg agg atg ata ctg atg acc cat ttc ttt age 17511 
Ala Pro Thr Leu Trp Ala Arg Met lie Leu Met Thr His Phe Phe Ser 
1600 1605 1610 

gtc ctt ata gec agg gac cag ctt gaa cag gec etc gat tgc gag ate 17559 
Val Leu lie Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie 
1615 1620 1625 

tac ggg gec tgc tac tec ata gaa cca ctg gat eta cct cca ate att 17607 
Tyr Gly Ala Cys Tyr Ser lie Glu Pro Leu Asp Leu Pro Pro lie lie 
1630 1635 1640 

caa aga etc cat ggc etc age gca ttt tea etc cac agt tac tct cca 17655 
Gin Arg Leu His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro 
1645 1650 1655 

ggt gaa ate aat agg gtg gee gca tgc etc aga aaa ctt ggg gta ccg 17703 
Gly Glu lie Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro 
1660 1665 1670 1675 

ccc ttg cga get tgg aga cac egg gee egg age gtc cgc get agg ctt 17751 
Pro Leu Arg Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu 
1680 1685 1690 

ctg gee aga gga ggc agg get gec ata tgt ggc aag tac etc ttc aac 17799 
Leu Ala Arg Gly Gly Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn 
1695 1700 1705 

tgg gca gta aga aca aag etc aaa etc act cca ata gcg gec get ggc 17847 
Trp Ala Val Arg Thr Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly 



151 



1710 1715 1720 

cag ctg gac ttg tec ggc tgg ttc acg get ggc tac age ggg gga gac 17895 
Gin Leu Asp Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp 
1725 1730 1735 

att tat cac age gtg tct cat gee egg ccc cgc tgg ate tgg ttt tgc 17943 
lie Tyr His Ser Val Ser His Ala Arg Pro Arg Trp lie Trp Phe Cys 
1740 1745 1750 1755 

eta etc ctg ctt get gca ggg gta ggc ate tac etc etc ccc aac cga 17991 
Leu Leu Leu Leu Ala Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg 
1760 1765 1770 

atg age acg aat cct aaa cct caa aga aag ace aaa cgt aac ace aac 18 03 9 
Met Ser Thr Asn Pro Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn 
1775 1780 1785 

egg egg ccg cag gac gtc aag ttc ccg ggt ggc ggt cag ate gtt ggt 18 087 
Arg Arg Pro Gin Asp Val Lys Phe Pro Gly Gly Gly Gin lie Val Gly 
1790 1795 1800 

gga gtt tac ttg ttg ccg cgc agg ggc cct aga ttg ggt gtg cgc gcg 18135 
Gly Val Tyr Leu Leu Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala 
1805 1810 1815 

acg aga aag act tec gag egg teg caa cct cga ggt aga cgt cag cct 18183 
Thr Arg Lys Thr Ser Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro 
1820 1825 1830 1835 

ate ccc aag get cgt egg ccc gag ggc agg acc tgg get cag ccc ggg 182 31 
lie Pro Lys Ala Arg Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly 
1840 1845 1850 

tac cct tgg ccc etc tat ggc aat gag ggc tgc ggg tgg gcg gga tgg 18279 
Tyr Pro Trp Pro Leu Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp 
1855 1860 1865 

etc ctg tct ccc cgt ggc tct egg cct age tgg ggc ccc aca gac ccc 1832 7 
Leu Leu Ser Pro Arg Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro 
1870 1875 1880 

egg cgt agg teg cgc aat ttg ggt aag gtc ate gat acc ctt acg tgc 183 75 
Arg Arg Arg Ser Arg Asn Leu Gly Lys Val lie Asp Thr Leu Thr Cys 
1885 1890 1895 

ggc ttc gee gac etc atg ggg tac ata ccg etc gtc taatagtcga 18421 

Gly Phe Ala Asp Leu Met Gly Tyr lie Pro Leu Val 
1900 1905 1910 

ctttgttccc actgtacttt tagctegtae aaaatacaat atacttttca tttctccgta 18481 

aacaacatgt tttcccatgt aatatccttt tctatttttc gttccgttac caactttaca 18541 

catactttat atagctattc acttctatac actaaaaaac taagacaatt ttaattttgc 18601 

tgcctgccat atttcaattt gttataaatt cctataattt atcctattag tagctaaaaa 18661 
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aagatgaatg tgaatcgaat cctaagagaa ttggatctga tccacaggac gggtgtggtc 18721 
gccatgatcg cgtagtcgat agtggctcca agtagcgaag cgagcaggac tgggcggcgg 18781 
ccaaagcggt cggacagtgc tccgagaacg ggtgcgcata gaaattgcat caacgcatat 18841 
agcgctagca gcacgccata gtgactggcg atgctgtcgg aatggacgat atcccgcaag 18901 
aggcccggca gtaccggcat aaccaagcct atgcctacag catccagggt gacggtgccg 18961 
aggatgacga tgagcgcatt gttagatttc atacacggtg cctgactgcg ttagcaattt 19021 
aactgtgata aactaccgca ttaaagcttt ttctttccaa tttttttttt ttcgtcatta 19081 
taaaaatcat tacgaccgag attcccgggt aataactgat ataattaaat tgaagctcta 19141 
atttgtgagt ttagtataca tgcatttact tataatacag ttttttagtt ttgctggccg 192 01 
catcttctca aatatgcttc ccagcctgct tttctgtaac gttcaccctc taccttagca 19261 
tcccttccct ttgcaaatag tcctcttcca acaataataa tgtcagatcc tgtagagacc 19321 
acatcatcca cggttctata ctgttgaccc aatgcgtctc ccttgtcatc taaacccaca 19381 
ccgggtgtca taatcaacca atcgtaacct tcatctcttc cacccatgtc tctttgagca 19441 
ataaagccga taacaaaatc tttgtcgctc ttcgcaatgt caacagtacc cttagtatat 19501 
tctccagtag atagggagcc cttgcatgac aattctgcta acatcaaaag gcctctaggt 19561 
tcctttgtta cttcttctgc cgcctgcttc aaaccgctaa caatacctgg gcccaccaca 19621 
ccgtgtgcat tcgtaatgtc tgcccattct gctattctgt atacacccgc agagtactgc 19681 
aatttgactg tattaccaat gtcagcaaat tttctgtctt cgaagagtaa aaaattgtac 19741 
ttggcggata atgcctttag cggcttaact gtgccctcca tggaaaaatc agtcaagata 19801 
tccacatgtg tttttagtaa acaaattttg ggacctaatg cttcaactaa ctccagtaat 19861 
tccttggtgg tacgaacatc caatgaagca cacaagtttg tttgcttttc gtgcatgata 19921 
ttaaatagct tggcagcaac aggactagga tgagtagcag cacgttcctt atatgtagct 19981 
ttcgacatga tttatcttcg tttcctgcag gtttttgttc tgtgcagttg ggttaagaat 2 0041 
actgggcaat ttcatgtttc ttcaacacta catatgcgta tatataccaa tctaagtctg 20101 
tgctccttcc ttcgttcttc cttctgttcg gagattaccg aatcaaaaaa atttcaagga 20161 
aaccgaaatc aaaaaaaaga ataaaaaaaa aatgatgaat tgaaaagctt atcgat 2 0217 

<210> 17 

<211> 1911 

<212> PRT 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: 
pd. delta. NS3NS5.pj .corel40 

<400> 17 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
15 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly He Asp Pro Asn He Arg Thr Gly Val Arg Thr He Thr Thr Gly 
35 40 45 

Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp He He He Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp Gin Ala 
85 90 95 

Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn He Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu Glu Val 
130 135 140 

He Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 
145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly He Asn Ala Val Ala 
165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr He Glu Thr He Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 



Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 



245 



250 



255 



260 



265 



270 
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Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 

Val Phe Thr Gly Leu Thr His lie Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu lie Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu lie Thr Leu Thr His Pro 
385 390 395 400 

Val Thr Lys Tyr lie Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val lie Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala lie lie Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr lie Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val lie Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 525 

lie Ser Gly lie Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala lie Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn lie Leu Gly Gly Trp Val 
565 570 575 
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Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala lie Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

lie Asp lie Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys lie Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala lie Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala lie Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu lie Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala lie Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 

His Gin Trp lie Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp lie Trp Asp Trp He Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly He 
770 775 780 

Met His Thr Arg Cys His Cys Gly Ala Glu He Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg He Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 

Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 
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Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His lie Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu lie Ser Val Pro Ala Glu He Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly He 
105 1110 1115 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 



157 



Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 

Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 

1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 

Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu lie Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 



158 




Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val lie Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn lie lie Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 

Ala Arg Met lie Leu Met Thr His Phe Phe Ser Val Leu lie Ala Arg 



Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser lie Glu Pro Leu Asp Leu Pro Pro lie lie Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu lie Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 1680 

Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 

Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp lie Tyr His Ser Val 
1730 1735 1740 

Ser His Ala Arg Pro Arg Trp lie Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 

Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg Met Ser Thr Asn Pro 
1765 1770 1775 

Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn Arg Arg Pro Gin Asp 



1605 



1610 



1615 



1780 



1785 



1790 
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Val Lys Phe Pro Gly Gly Gly Gin lie Val Gly Gly Val Tyr Leu Leu 
1795 1800 1805 

Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala Thr Arg Lys Thr Ser 
1810 1815 1820 

Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro lie Pro Lys Ala Arg 
825 1830 1835 1840 

Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly Tyr Pro Trp Pro Leu 
1845 1850 1855 

Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp Leu Leu Ser Pro Arg 
1860 1865 1870 

Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro Arg Arg Arg Ser Arg 
1875 1880 1885 

Asn Leu Gly Lys Val lie Asp Thr Leu Thr Cys Gly Phe Ala Asp Leu 
1890 1895 1900 

Met Gly Tyr He Pro Leu Val 
905 1910 



<210> 18 
<211> 20247 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd. delta. NS3NS5.pj .corelSO 

<220> 
<221> CDS 

<222> (12679) . . (18441) 
<400> 18 

atcgatccta ccccttgcgc taaagaagta tatgtgccta ctaacgcttg tctttgtctc 60 
tgtcactaaa cactggatta ttactcccag atacttattt tggactaatt taaatgattt 120 
cggatcaacg ttcttaatat cgctgaatct tccacaattg atgaaagtag ctaggaagag 180 
gaattggtat aaagtttttg tttttgtaaa tctcgaagta tactcaaacg aatttagtat 240 
tttctcagtg atctcccaga tgctttcacc ctcacttaga agtgctttaa gcattttttt 3 00 
actgtggcta tttcccttat ctgcttcttc cgatgattcg aactgtaatt gcaaactact 3 60 
tacaatatca gtgatatcag attgatgttt ttgtccatag taaggaataa ttgtaaattc 420 
ccaagcagga atcaatttct ttaatgaggc ttccagaatt gttgcttttt gcgtcttgta 480 
tttaaactgg agtgatttat tgacaatatc gaaactcagc gaattgctta tgatagtatt 54 0 
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atagctcatg aatgtggctc tcttgattgc 
ataggttagt tcagcagcac ataatgctat 
cacaaactga cgaacaagca ccttaggtgg 
gcttagcgcc gatcttgtgt gcaattgata 
tcttgcagta ttcaaacacg ctaactcgaa 



cgttctttcg 


aaaaatgcac 


cggccgcgca 


cggtatcttc 


atttcatatt 


ttaaaaatgc 


ggcccgcagg 


ttcgttttgc 


ggtactatct 


tgcacagatt 


ttataatgta 


ataagcaaga 


agaaaaccaa 


aatggacgac 


attgaaacag 


cttatagcgt 


ctgggatgta 


tgtcggctgt 


ttgatataga 


gagtaaacgt 


aagtctgatg 


ccatggaatc 


tctcacaacc 


ggtaggccgt 


gcgtatcttc 


tgactccagt 


gctgaggtaa 


ggtttgattc 


gattggaaat 


ggtatgctct 


atttgatgct 


acagaataac 


aagctgttag 


ctataataat 


aggaagattg 


cccgagaaag 


gaaaaatgga 


ttgtacacag 


ttattagtcc 


agctcgtaag 


cgtcgttacc 


caattgctta 


taataggtga 


tttattcatc 


ccggaatctc 


tcrcrccrcrcacia 

**** ZD ZD 


gaafccgt tfca 


cagcaaaaaa 


accatgctaa 


tacaaatgaa 


gaagttccct 


caagaggagc 


atataaatta 


caaaacacca 


aaaaaaggag 


agtagcaacg 


agggtaaggg 


gatccaatat 


caaaggaaat 


gatagcattg 


cagcatatag 


aacagctaaa 


gggtagtgct 


gggataatat 


cacaggaggt 


actagactac 


gtacgcattt 


aagcataaac 


acgcactatg 


caacacgcag 


atataggtgc 


gacgtgaaca 




tgttccgtta tgtgtaatca tccaacataa 600 
tttctcacct gaaggtcttt caaacctttc 660 
tgttttacat aatatatcaa attgtggcat 72 0 
tctagtttca actactctat ttatcttgta 780 
aaactaactt taattgtcct gtttgtctcg 84 0 
ttatttgtac tgcgaaaata attggtactg 90 0 
acctttgctg cttttcctta atttttagac 960 
tgtgataaaa agttgttttg acatgtgatc 102 0 
atacattatc aaacgaacaa tactggtaaa 108 0 
ccaagaatct gacggtaaaa gcacgtacag 114 0 
ttattgaaat gattgctcct gatgtagata 12 0 0 
agctactctt tccaggatat gtcataaggc 1260 
atggtcttga ttctagcgca gaagattcca 1320 
ttttgcctgc tgcgaagatg gttaaggaaa 13 80 
cttcacaaga agcaagtcag gctgccatag 144 0 
acaatagaaa gcaactatac aaatctattg 1500 
acaagaagag agctaccgaa atgctcatga 1560 
caccagctcc aacggaagaa gatgttatga 1620 
ctttagttcc accagatcgt caagctgctt 1680 
taaaggatat attcaatagt ttcaatgaac 174 0 
agagtgagtt ggaaggaagg actgaagtga 1800 
ccaggcgaac aagaagtaga gacacaaatg 1860 
tcactgaggg ccctaaagcg gttcccacga 192 0 
gcagaaaatc acgtaatact tctagggtat 1980 
aaggatgaga ctaatccaat tgaggagtgg 2 04 0 
gaaggaagca tacgataccc cgcatggaat 2100 
ctttcatcct acataaatag acgcatataa 2160 
ccgttcttct catgtatata tatatacagg 2220 
gtgagctgta tgtgcgcagc tcgcgttgca 2280 
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ttttcggaag cgctcgtttt cggaaacgct 
ctagaaagta taggaacttc agagcgcttt 
ttcaaaaaac caaaaacgca ccggactgta 
tccacaaaca ttgctcaaaa gtatctcttt 
aacctaccca tccacctttc gctccttgaa 
aggcttccaa tgctcttcaa attttactgt 
ctcttcataa tgtaagctta tctttatcga 
ctttacggtt ccctgagatt gaattagttc 
ctttgtacga cgaattttga ggttcgccat 
tattatctcc gcctcagttt gatcttccgc 
tatttcaccc cacaatcctt catccgcctc 
atgttgtaca ttgtttagtt cacgagaagg 
tatatgacct ttatcctgtt ctctttccac 
gcacctaata acattcttca aggcggagaa 
tgaaaacgtg agaatgaatt tagtattatt 
tcgaagataa gagaagaatg cagtgacctt 
aaaaaatacg cctttaggcc ttctgatacc 
attaatatct aaaccctctc cgatggtggc 
aaactgtgat aattctgggt gatttatgat 
aggatcaggc caatccagtt ctttttcaat 
tccaacaaat gcaaatgcta acgttttgta 
cccccttgtc gtctcgatta cacacctact 
cataatacat tgcttaatac aagcaagcag 
cattacagct gatgtcattg tatatcagcg 
tcgcggtttt tataaacaaa actttcgtta 
ttggaaattc gggaaaaagt agagcaacgc 
ttaacttcga gaagggatta aggctaattt 
ccattgaatg ccttataaaa cagctataga 
tttgtcaaag cttactgatg atgatgtgtc 




ttgaagttcc tattccgaag ttcctattct 2340 
tgaaaaccaa aagcgctctg aagacgcact 24 00 
acgagctact aaaatattgc gaataccgct 2460 
gctatatatc tctgtgctat atccctatat 2520 
cttgcatcta aactcgacct ctacatcaac 2580 
caagtagacc catacggctg taatatgctg 264 0 
atcgtgtgaa aaactactac cgcgataaac 27 00 
ctttagtata tgatacaaga cacttttgaa 2760 
cctctggcta tttccaatta tcctgtcggc 2820 
ttcagactgc catttttcac ataatgaatc 2880 
cgcatcttgt tccgttaaac tattgacttc 2940 
gtcctcttca ggcggtagct cctgatctcc 3000 
aaacttagaa atgtattcat gaattatgga 3060 
gtttgggcca gatgcccaat atgcttgaca 312 0 
gtgatattct gaggcaattt tattataatc 3180 
tgtattgaca aatggagatt ccatgtatct 3240 
ctttcccctg cggtttagcg tgccttttac 3300 
ctttaactga ctaataaatg caaccgatat 3360 
tcgatcgaca attgtattgt acactagtgc 342 0 
taccggtgtg tcgtctgtat tcagtacatg 3480 
tttcttataa ttgtcaggaa ctggaaaagt 3 54 0 
ttcatcgtac accataggtt ggaagtgctg 3600 
tctctcgcca ttcatatttc agttattttc 3660 
ctgtaaaaat ctatctgtta cagaaggttt 3720 
cgaaatcgag caatcacccc agctgcgtat 3780 
gagttgcatt ttttacacca taatgcatga 3840 
cactagtatg tttcaaaaac ctcaatctgt 3900 
ttgcatagaa gagttagcta ctcaatgctt 3 960 
tactttcagg cgggtctgta gtaaggagaa 4 02 0 
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tgacattata 


aagctggcac 


ttagaattcc 


tctactgtac 


gatacacttc 


cgctcaggtc 


ttgttactct 


attgatccag 


ctcagcaaag 


tgtagtaaaa 


ctagctagac 


cgagaaagag 


gctgccatca 


ttattatccg 


atgtgacgct 


tttttttttt 


tttttttttt 


ttttttggta 


agcaaggatt 


ttcttaactt 


cttcggcgac 


accacctaaa 


tcaccagttc 


tgatacctgc 


ggctttacct 


tcttcaggca 


agttcaatga 


agtggcgata 


gggttgacct 


tattctttgg 


gtacaaacca 


aatgcggtgt 


tcttgtctgg 


acccaaggag 


cctgggataa 


cggaggcttc 


ggtgattata 


ataccattta 


ggtgggttgg 


aatcaattga 


tgttgaactt 


tcaatgtagg 


ttttctccat 


aatcttgaag 


aggccaaaac 


tggtggctca 


tgttgtaggg 


ccatgaaagc 


aacggtgtat 


tgttcactat 


cccaagcgac 


aaagtaaata 


cctcccacta 


attctctaac 


tggcttgatt 


ggagataagt 


ctaaaagaga 


ggcgtacaat 


tgaagttctt 


tacggatttt 


ggtaccccat 


ttaggaccac 


ccacagcacc 


ttccagcgcc 


tcatctggaa 


gtggaacacc 


atgattttcg 


aaatcgaact 


tgacattgga 


aatggcttcg 


gctgtgattt 


cttgaccaac 


aggggcagac 


attacaatgg 


tatatccttg 


aaaaaaaaaa 


atgcagcttc 


tcaatgatat 


tatccgacaa 


actgttttac 


agatttacga 


acatccgaac 


ctgggagttt 


tccctgaaac 


tatagtctag 


cgctttacgg 


aagacaatgt 



acggactata gactatacta gtatactccg 4080 
cttgtccttt aacgaggcct taccactctt 4140 
gcagtgtgat ctaagattct atcttcgcga 42 0 0 
actagaaatg caaaaggcac ttctacaatg 4260 
gcattttttt tttttttttt tttttttttt 4320 
caaatatcat aaaaaaagag aatcttttta 43 80 
agcatcaccg acttcggtgg tactgttgga 444 0 
atccaaaacc tttttaactg catcttcaat 4500 
caatttcaac atcattgcag cagacaagat 4560 
caaatctgga gcggaaccat ggcatggttc 462 0 
caaagaggcc aaggacgcag atggcaacaa 46 80 
atcggagatg atatcaccaa acatgttgct 474 0 
gttcttaact aggatcatgg cggcagaatc 4800 
gaattcgttc ttgatggttt cctccacagt 4860 
attagcttta tccaaggacc aaataggcaa 4 92 0 
ggccattctt gtgattcttt gcacttctgg 4980 
accatcacca tcgtcttcct ttctcttacc 5040 
aacaacgaag tcagtacctt tagcaaattg 5100 
gtcggatgca aagttacatg gtcttaagtt 5160 
tagtaaacct tgttcaggtc taacactacc 522 0 
taacaaaacg gcatcagcct tcttggaggc 52 8 0 
tgtagcatcg atagcagcac caccaattaa 534 0 
acgaacatca gaaatagctt taagaacctt 5400 
gtggtcacct ggcaaaacga cgatcttctt 5460 
aaatatatat aaaaaaaaaa aaaaaaaaaa 552 0 
tcgaatacgc tttgaggaga tacagcctaa 5580 
tcgtacttgt tacccatcat tgaattttga 5 640 
agatagtata tttgaacctg tataataata 5700 
atgtatttcg gttcctggag aaactattgc 5760 
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atctattgca 


taggtaatct 


tgcacgtcgc 


tgcacttcaa 


tagcatatct 


ttgttaacga 


atgcaacgcg 


agagcgctaa 


tttttcaaac 


gaaatgcaac 


gcgaaagcgc 


tattttacca 


caaaaatgca 


acgcgagagc 


gctaattttt 


gaacagaaat 


gcaacgcgag 


agcgctattt 


ttctacaaaa 


atgcatcccg 


agagcgctat 


tttctccttt 


gtgcgctcta 


taatgcagtc 


taaggttaga 


agaaggctac 


tttggtgtct 


cacttcccgc 


gtttactgat 


tactagcgaa 


atccccgatt 


atattctata 


ccgatgtgga 


gcgttgatga 


ttcttcattg 


gtcagaaaat 


atactacgta 


taggaaatgt 


ttacattttc 


tcttactaca 


atttttttgt 


ctaaagagta 


gtcgagttta 


gatgcaagtt 


caaggagcga 


agcacagaga 


tatatagcaa 


agagatactt 


aatattttag 


tagctcgtta 


cagtccggtg 


gagcgctttt 


ggttttcaaa 


agcgctctga 


tcggaatagg 


aacttcaaag 


cgtttccgaa 


tgcgcacata 


cagctcactg 


ttcacgtcgc 


tatacatgag 


aagaacggca 


tagtgcgtgt 


atttatgtag 


gatgaaaggt 


agtctagtac 


gtatcgtatg 


cttccttcag 


cactaccctt 


tggattagtc 


tcatccttca 


atgctatcat 


ccgagaaact 


agtgcgaagt 


agtgatcagg 


cctggccacg 


gcagaagcac 


gcttatcgct 


taggcccttc 


attgaaagaa 


atgaggtcat 


attttttata 


gcaaagattg 


aataaggcgc 


gactaagtta 


tcttttaata 


attggtattc 



atccccggtt cattttctgc gtttccatct 5820 
agcatctgtg cttcattttg tagaacaaaa 58 80 
aaagaatctg agctgcattt ttacagaaca 5940 
acgaagaatc tgtgcttcat ttttgtaaaa 6000 
caaacaaaga atctgagctg catttttaca 6060 
taccaacaaa gaatctatac ttcttttttg 6120 
ttttctaaca aagcatctta gattactttt 6180 
tcttgataac tttttgcact gtaggtccgt 6240 
attttctctt ccataaaaaa agcctgactc 6300 
gctgcgggtg cattttttca agataaaggc 63 60 
ttgcgcatac tttgtgaaca gaaagtgata 642 0 
tatgaacggt ttcttctatt ttgtctctat 6480 
gtattgtttt cgattcactc tatgaatagt 654 0 
atactagaga taaacataaa aaatgtagag 6600 
aaggtggatg ggtaggttat atagggatat 6660 
ttgagcaatg tttgtggaag cggtattcgc 672 0 
cgtttttggt tttttgaaag tgcgtcttca 6780 
agttcctata ctttctagag aataggaact 684 0 
aacgagcgct tccgaaaatg caacgcgagc 6900 
acctatatct gcgtgttgcc tgtatatata 6960 
ttatgcttaa atgcgtactt atatgcgtct 7020 
ctcctgtgat attatcccat tccatgcggg 7080 
tagctgttct atatgctgcc actcctcaat 7140 
ttcctttgat attggatcat atgcatagta 7200 
tattgctgtt atctgatgag tatacgttgt 7260 
ccaatttccc acaacattag tcaactccgt 7320 
caaatgtctt ccaatgtgag attttgggcc 73 8 0 
atttttcttc aaagctttat tgtacgatct 7440 
ctgtttattg cttgaagaat tgccggtcct 7500 
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atttactcgt tttaggactg gttcagaatt 
atcgatgata agctgtcaaa catgagaatt 
tatttttata ggttaatgtc atgataataa 
ggggaaatgt gcgcggaacc cctatttgtt 
cgctcatgag acaataaccc tgataaatgc 
gtattcaaca tttccgtgtc gcccttattc 
ttgctcaccc agaaacgctg gtgaaagtaa 
tgggttacat cgaactggat ctcaacagcg 
aacgttttcc aatgatgagc acttttaaag 
ttgacgccgg gcaagagcaa ctcggtcgcc 
agtactcacc agtcacagaa aagcatctta 
gtgctgccat aaccatgagt gataacactg 
gaccgaagga gctaaccgct tttttgcaca 
gttgggaacc ggagctgaat gaagccatac 
cagcaatggc aacaacgttg cgcaaactat 
ggcaacaatt aatagactgg atggaggcgg 
cccttccggc tggctggttt attgctgata 
gtatcattgc agcactgggg ccagatggta 
cggggagtca ggcaactatg gatgaacgaa 
tgattaagca ttggtaactg tcagaccaag 
aacttcattt ttaatttaaa aggatctagg 
aaatccctta acgtgagttt tcgttccact 
gatcttcttg agatcctttt tttctgcgcg 
cgctaccagc ggtggtttgt ttgccggatc 
ctggcttcag cagagcgcag ataccaaata 
accacttcaa gaactctgta gcaccgccta 
tggctgctgc cagtggcgat aagtcgtgtc 
cggataaggc gcagcggtcg ggctgaacgg 
gaacgaccta caccgaactg agatacctac 



cctcaaaaat tcatccaaat atacaagtgg 7560 
cttgaagacg aaagggcctc gtgatacgcc 762 0 
tggtttctta gacgtcaggt ggcacttttc 7680 
tatttttcta aatacattca aatatgtatc 7740 
ttcaataata ttgaaaaagg aagagtatga 780 0 
ccttttttgc ggcattttgc cttcctgttt 7860 
aagatgctga agatcagttg ggtgcacgag 792 0 
gtaagatcct tgagagtttt cgccccgaag 7980 
ttctgctatg tggcgcggta ttatcccgtg 8040 
gcatacacta ttctcagaat gacttggttg 8100 
cggatggcat gacagtaaga gaattatgca 8160 
cggccaactt acttctgaca acgatcggag 8220 
acatggggga tcatgtaact cgccttgatc 82 8 0 
caaacgacga gcgtgacacc acgatgcctg 8340 
taactggcga actacttact ctagcttccc 8400 
ataaagttgc aggaccactt ctgcgctcgg 8460 
aatctggagc cggtgagcgt gggtctcgcg 852 0 
agccctcccg tatcgtagtt atctacacga 858 0 
atagacagat cgctgagata ggtgcctcac 8640 
tttactcata tatactttag attgatttaa 8700 
tgaagatcct ttttgataat ctcatgacca 8760 
gagcgtcaga ccccgtagaa aagatcaaag 882 0 
taatctgctg cttgcaaaca aaaaaaccac 8880 
aagagctacc aactcttttt ccgaaggtaa 894 0 
ctgtccttct agtgtagccg tagttaggcc 9000 
catacctcgc tctgctaatc ctgttaccag 9060 
ttaccgggtt ggactcaaga cgatagttac 912 0 
ggggttcgtg cacacagccc agcttggagc 918 0 
agcgtgagct atgagaaagc gccacgcttc 9240 
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ccgaagggag 


aaaggcggac 


aggtatccgg 


cgagggagct 


tccaggggga 


aacgcctggt 


tctgacttga 


gcgtcgattt 


ttgtgatgct 


ccagcaacgc 


ggccttttta 


cggttcctgg 


ttcctgcgtt 


atcccctgat 


tctgtggata 


ccgctcgccg 


cagccgaacg 


accgagcgca 


gcctgatgcg 


gtattttctc 


cttacgcatc 


ctctcagtac 


aatctgctct 


gatgccgcat 


acgtgactgg 


gtcatggctg 


cgccccgaca 


ggcttgtctg 


ctcccggcat 


ccgcttacag 


gtgtcagagg 


ttttcaccgt 


catcaccgaa 


agcgtggtcg 


tgaagcgatt 


cacagatgtc 


tttctccaga 


agcgttaatg 


tctggcttct 


ttcctgtttg 


gtcactgatg 


cctccgtgta 


accgatgaaa 


cgagagagga 


tgctcacgat 


actggaacgt 


tgtgagggta 


aacaactggc 


cactcagggt 


caatgccagc 


gcttcgttaa 


gcagcatcct 


gcgatgcaga 


tccggaacat 


cagactttac 


gaaacacgga 


aaccgaagac 


tttgcagcag 


cagtcgcttc 


acgttcgctc 


aaggcaaccc 


cgccagccta 


gccgggtcct 


tggccaggac 


ccaacgctgc 


ccgagatgcg 


gatggatatg 


ttctgccaag 


ggttggtttg 


ggctccaatt 


cttggagtgg 


tgaatccgtt 


gaggtggccc 


ggctccatgc 


accgcgacgc 


gcgcctacaa 


tccatgccaa 


cccgttccat 


acgatcagcg 


gtccaatgat 


cgaagttagg 


tgtccctgat 


ggtcgtcatc 


tacctgcctg 


atgccgccgg 


aagcgagaag 


aatcataatg 



taagcggcag ggtcggaaca ggagagcgca 93 00 
atctttatag tcctgtcggg tttcgccacc 9360 
cgtcaggggg gcggagccta tggaaaaacg 942 0 
ccttttgctg gccttttgct cacatgttct 9480 
accgtattac cgcctttgag tgagctgata 9540 
gcgagtcagt gagcgaggaa gcggaagagc 9600 
tgtgcggtat ttcacaccgc atatggtgca 9660 
agttaagcca gtatacactc cgctatcgct 9720 
cccgccaaca cccgctgacg cgccctgacg 9780 
acaagctgtg accgtctccg ggagctgcat 984 0 
acgcgcgagg cagctgcggt aaagctcatc 9900 
tgcctgttca tccgcgtcca gctcgttgag 9960 
gataaagcgg gccatgttaa gggcggtttt 1002 0 
a ggggg a ttt ctgttcatgg gggtaatgat 10080 
acgggttact gatgatgaac atgcccggtt 1014 0 
ggtatggatg cggcgggacc agagaaaaat 102 00 
tacagatgta ggtgttccac agggtagcca 102 60 
aatggtgcag ggcgctgact tccgcgtttc 10320 
cattcatgtt gttgctcagg tcgcagacgt 103 80 
gcgtatcggt gattcattct gctaaccagt 1044 0 
caacgacagg agcacgatca tgcgcacccg 105 00 
ccgcgtgcgg ctgctggaga tggcggacgc 10560 
cgcattcaca gttctccgca agaattgatt 10620 
agcgaggtgc cgccggcttc cattcaggtc 1068 0 
aacgcgggga ggcagacaag gtatagggcg 10740 
gtgctcgccg aggcggcata aatcgccgtg 10800 
ctggtaagag ccgcgagcga tccttgaagc 10 860 
gacagcatgg cctgcaacgc gggcatcccg 10920 
gggaaggcca tccagcctcg cgtcgcgaac 10980 
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gccagcaaga cgtagcccag cgcgtcggcc gccatgccgg cgataatggc ctgcttctcg 11040 
ccgaaacgtt tggtggcggg accagtgacg aaggcttgag cgagggcgtg caagattccg 11100 
aataccgcaa gcgacaggcc gatcatcgtc gcgctccagc gaaagcggtc ctcgccgaaa 11160 
atgacccaga gcgctgccgg cacctgtcct acgagttgca tgataaagaa gacagtcata 1122 0 
agtgcggcga cgatagtcat gccccgcgcc caccggaagg agctgactgg gttgaaggct 112 80 
ctcaagggca tcggtcgagg atccttcaat atgcgcacat acgctgttat gttcaaggtc 11340 
ccttcgttta agaacgaaag cggtcttcct tttgagggat gtttcaagtt gttcaaatct 114 0 0 
atcaaatttg caaatcccca gtctgtatct agagcgttga atcggtgatg cgatttgtta 11460 
attaaattga tggtgtcacc attaccaggt ctagatatac caatggcaaa ctgagcacaa 1152 0 
caataccagt ccggatcaac tggcaccatc tctcccgtag tctcatctaa tttttcttcc 11580 
ggatgaggtt ccagatatac cgcaacacct ttattatggt ttccctgagg gaataataga 1164 0 
atgtcccatt cgaaatcacc aattctaaac ctgggcgaat tgtatttcgg gtttgttaac 11700 
tcgttccagt caggaatgtt ccacgtgaag ctatcttcca gcaaagtctc cacttcttca 11760 
tcaaattgtg gagaatactc ccaatgctct tatctatggg acttccggga aacacagtac 1182 0 
cgatacttcc caattcgtct tcagagctca ttgtttgttt gaagagacta atcaaagaat 11880 
cgttttctca aaaaaattaa tatcttaact gatagtttga tcaaaggggc aaaacgtagg 11940 
ggcaaacaaa cggaaaaatc gtttctcaaa ttttctgatg ccaagaactc taaccagtct 12000 
tatctaaaaa ttgccttatg atccgtctct ccggttacag cctgtgtaac tgattaatcc 12060 
tgcctttcta atcaccattc taatgtttta attaagggat tttgtcttca ttaacggctt 12120 
tcgctcataa aaatgttatg acgttttgcc cgcaggcggg aaaccatcca cttcacgaga 12180 
ctgatctcct ctgccggaac accgggcatc tccaacttat aagttggaga aataagagaa 12240 
tttcagattg agagaatgaa aaaaaaaaac ccttagttca taggtccatt ctcttagcgc 123 00 
aactacagag aacaggggca caaacaggca aaaaacgggc acaacctcaa tggagtgatg 12 3 60 
caacctgcct ggagtaaatg atgacacaag gcaattgacc cacgcatgta tctatctcat 1242 0 
tttcttacac cttctattac cttctgctct ctctgatttg gaaaaagctg aaaaaaaagg 12480 
ttgaaaccag ttccctgaaa ttattcccct acttgactaa taagtatata aagacggtag 12540 
gtattgattg taattctgta aatctatttc ttaaacttct taaattctac ttttatagtt 12600 
agtctttttt ttagttttaa aacaccaaga acttagtttc gaataaacac acataaacaa 12660 
acaagcttac aaaacaaa atg get gca tat gca get cag ggc tat aag gtg 12711 
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Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val 
15 10 

eta gta etc aac ccc tct gtt get gca aca ctg ggc ttt ggt get tac 12 75 9 
Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr 
15 20 25 

atg tec aag get cat ggg ate gat cct aac ate agg acc ggg gtg aga 12 8 07 
Met Ser Lys Ala His Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg 
30 35 40 

aca att acc act ggc age ccc ate acg tac tec acc tac ggc aag ttc 12855 
Thr lie Thr Thr Gly Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe 
45 50 55 

ctt gee gac ggc ggg tgc teg ggg ggc get tat gac ata ata att tgt 12903 
Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp lie lie lie Cys 
60 65 70 75 

gac gag tgc cac tec acg gat gec aca tec ate ttg ggc att ggc act 12 951 
Asp Glu Cys His Ser Thr Asp Ala Thr Ser lie Leu Gly lie Gly Thr 
80 85 90 

gtc ctt gac caa gca gag act gcg ggg gcg aga ctg gtt gtg etc gee 12999 
Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala 
95 100 105 

acc gee acc cct ccg ggc tec gtc act gtg ccc cat ccc aac ate gag 13047 
Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn lie Glu 
110 115 120 

gag gtt get ctg tec acc acc gga gag ate cct ttt tac ggc aag get 13095 
Glu Val Ala Leu Ser Thr Thr Gly Glu lie Pro Phe Tyr Gly Lys Ala 
125 130 135 

ate ccc etc gaa gta ate aag ggg ggg aga cat etc ate ttc tgt cat 13143 
lie Pro Leu Glu Val lie Lys Gly Gly Arg His Leu lie Phe Cys His 
140 145 150 155 

tea aag aag aag tgc gac gaa etc gee gca aag ctg gtc gca ttg ggc 13191 
Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly 
160 165 170 

ate aat gee gtg gee tac tac cgc ggt ctt gac gtg tec gtc ate ccg 1323 9 
lie Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val lie Pro 
175 180 185 

acc age ggc gat gtt gtc gtc gtg gca acc gat gec etc atg acc ggc 13287 
Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly 
190 195 200 

tat acc ggc gac ttc gac teg gtg ata gac tgc aat acg tgt gtc acc 13335 
Tyr Thr Gly Asp Phe Asp Ser Val lie Asp Cys Asn Thr Cys Val Thr 
205 210 215 

cag aca gtc gat ttc age ctt gac cct acc ttc acc att gag aca ate 13383 
Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr lie Glu Thr lie 
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220 225 230 235 

acg etc ccc caa gat get gtc tec cgc act caa cgt egg ggc agg act 13431 
Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr 
240 245 250 

ggc agg ggg aag cca ggc ate tac aga ttt gtg gca ccg ggg gag cgc 1347 9 
Gly Arg Gly Lys Pro Gly lie Tyr Arg Phe Val Ala Pro Gly Glu Arg 
255 260 265 

ccc tec ggc atg ttc gac teg tec gtc etc tgt gag tgc tat gac gca 13527 
Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala 
270 275 280 

ggc tgt get tgg tat gag etc acg ccc gec gag act aca gtt agg eta 13575 
Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu 
285 290 295 

cga gcg tac atg aac acc ccg ggg ctt ccc gtg tgc cag gac cat ctt 13623 
Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu 
300 305 310 315 

gaa ttt tgg gag ggc gtc ttt aca ggc etc act cat ata gat gee cac 13671 
Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His lie Asp Ala His 
320 325 330 

ttt eta tec cag aca aag cag agt ggg gag aac ctt cct tac ctg gta 13719 
Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val 
335 340 345 

gcg tac caa gec acc gtg tgc get agg get caa gee cct ccc cca teg 13767 
Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser 
350 355 360 

tgg gac cag atg tgg aag tgt ttg att cgc etc aag ccc acc etc cat 13815 
Trp Asp Gin Met Trp Lys Cys Leu lie Arg Leu Lys Pro Thr Leu His 
365 370 375 

ggg cca aca ccc ctg eta tac aga ctg ggc get gtt cag aat gaa ate 13 863 
Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu lie 
380 385 390 395 

acc ctg acg cac cca gtc acc aaa tac ate atg aca tgc atg teg gec 13911 
Thr Leu Thr His Pro Val Thr Lys Tyr lie Met Thr Cys Met Ser Ala 
400 405 410 

gac ctg gag gtc gtc acg age acc tgg gtg etc gtt ggc ggc gtc ctg 13959 
Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu 
415 420 425 

get get ttg gec gcg tat tgc ctg tea aca ggc tgc gtg gtc ata gtg 14007 
Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val lie Val 
430 435 440 

ggc agg gtc gtc ttg tec ggg aag ccg gca ate ata cct gac agg gaa 14055 
Gly Arg Val Val Leu Ser Gly Lys Pro Ala He He Pro Asp Arg Glu 
445 450 455 
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gtc etc tac 
Val Leu Tyr 
460 

ccg tac ate 
Pro Tyr lie 



gec etc ggc 
Ala Leu Gly 



cct get gtc 
Pro Ala Val 
510 

cat atg tgg 
His Met Trp 
525 

acg ctg cct 
Thr Leu Pro 
540 

get gtc acc 
Ala Val Thr 



ttg ggg ggg 
Leu Gly Gly 



gec ttt gtg 
Ala Phe Val 
590 

ctg ggg aag 
Leu Gly Lys 
605 

gcg gga get 
Ala Gly Ala 
620 

acg gag gac 
Thr Glu Asp 



etc gta gtc 
Leu Val Val 



ccg ggc gag 
Pro Gly Glu 
670 

tec egg ggg 



cga gag ttc 
Arg Glu Phe 
465 

gag caa ggg 
Glu Gin Gly 
480 

etc ctg cag 
Leu Leu Gin 

495 

cag acc aac 
Gin Thr Asn 



aac ttc ate 
Asn Phe lie 



ggt aac ccc 
Gly Asn Pro 
545 

age cca eta 
Ser Pro Leu 
560 

tgg gtg get 
Trp Val Ala 
575 

ggc get ggc 
Gly Ala Gly 



gtc etc ata 
Val Leu He 



ctt gtg gca 
Leu Val Ala 
625 

ctg gtc aat 
Leu Val Asn 
640 

ggc gtg gtc 
Gly Val Val 
655 

ggg gca gtg 

Gly Ala Val 



aac cat gtt 



gat gag atg 
Asp Glu Met 



atg atg etc 
Met Met Leu 



acc gcg tec 
Thr Ala Ser 
500 

tgg caa aaa 
Trp Gin Lys 
515 

agt ggg ata 
Ser Gly He 
530 

gee att get 
Ala He Ala 



acc act age 
Thr Thr Ser 



gec cag etc 
Ala Gin Leu 
580 

tta get ggc 
Leu Ala Gly 
595 

gac ate ctt 
Asp He Leu 
610 

ttc aag ate 
Phe Lys He 



eta ctg ccc 
Leu Leu Pro 



tgt gca gca 
Cys Ala Ala 
660 

cag tgg atg 
Gin Trp Met 
675 

tec ccc acg 



gaa gag tgc 
Glu Glu Cys 
470 

gee gag cag 
Ala Glu Gin 
485 

cgt cag gca 
Arg Gin Ala 



etc gag acc 
Leu Glu Thr 



caa tac ttg 
Gin Tyr Leu 
535 

tea ttg atg 
Ser Leu Met 
550 

caa acc etc 
Gin Thr Leu 
565 

gee gec ccc 
Ala Ala Pro 



gee gec ate 
Ala Ala He 



gca ggg tat 
Ala Gly Tyr 
615 

atg age ggt 
Met Ser Gly 
630 

gec ate etc 
Ala He Leu 
645 

ata ctg cgc 
He Leu Arg 



aac egg ctg 
Asn Arg Leu 



cac tac gtg 



tct cag cac 
Ser Gin His 



ttc aag cag 
Phe Lys Gin 
490 

gag gtt ate 
Glu Val He 
505 

ttc tgg gcg 
Phe Trp Ala 
520 

gcg ggc ttg 
Ala Gly Leu 



get ttt aca 
Ala Phe Thr 



etc ttc aac 
Leu Phe Asn 
570 

ggt gee get 
Gly Ala Ala 
585 

ggc agt gtt 
Gly Ser Val 
600 

ggc gcg ggc 
Gly Ala Gly 



gag gtc ccc 
Glu Val Pro 



teg ccc gga 
Ser Pro Gly 
650 

egg cac gtt 
Arg His Val 
665 

ata gee ttc 
He Ala Phe 
680 

ccg gag age 



tta 14103 

Leu 

475 

aag 14151 
Lys 



gec 14199 
Ala 



aag 14247 
Lys 



tea 14295 
Ser 



get 14343 

Ala 

555 

ata 14391 
He 



act 14439 
Thr 



gga 144 87 
Gly 



gtg 14535 
Val 



tec 14583 

Ser 

635 

gee 14631 
Ala 



ggc 14 67 9 
Gly 



gec 14727 
Ala 



gat 14775 
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Ser Arg Gly Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp 
685 690 695 

gca get gec cgc gtc act gec ata etc age age etc act gta acc cag 14 823 
Ala Ala Ala Arg Val Thr Ala lie Leu Ser Ser Leu Thr Val Thr Gin 
700 705 710 715 

etc ctg agg cga ctg cac cag tgg ata age teg gag tgt acc act cca 14 871 
Leu Leu Arg Arg Leu His Gin Trp lie Ser Ser Glu Cys Thr Thr Pro 
720 725 730 

tgc tec ggt tec tgg eta agg gac ate tgg gac tgg ata tgc gag gtg 14919 
Cys Ser Gly Ser Trp Leu Arg Asp lie Trp Asp Trp lie Cys Glu Val 
735 740 745 

ttg age gac ttt aag acc tgg eta aaa get aag etc atg cca cag ctg 14967 
Leu Ser Asp Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu 
750 755 760 

cct ggg ate ccc ttt gtg tec tgc cag cgc ggg tat aag ggg gtc tgg 15015 
Pro Gly lie Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp 
765 770 775 

cga ggg gac ggc ate atg cac act cgc tgc cac tgt gga get gag ate 15063 
Arg Gly Asp Gly lie Met His Thr Arg Cys His Cys Gly Ala Glu lie 
780 785 790 795 

act gga cat gtc aaa aac ggg acg atg agg ate gtc ggt cct agg acc 15111 
Thr Gly His Val Lys Asn Gly Thr Met Arg lie Val Gly Pro Arg Thr 
800 805 810 

tgc agg aac atg tgg agt ggg acc ttc ccc att aat gee tac acc acg 15159 
Cys Arg Asn Met Trp Ser Gly Thr Phe Pro lie Asn Ala Tyr Thr Thr 
815 820 825 

ggc ccc tgt acc ccc ctt cct gcg ccg aac tac acg ttc gcg eta tgg 152 07 
Gly Pro Cys Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp 
830 835 840 

agg gtg tct gca gag gaa tac gtg gag ata agg cag gtg ggg gac ttc 152 55 
Arg Val Ser Ala Glu Glu Tyr Val Glu lie Arg Gin Val Gly Asp Phe 
845 850 855 

cac tac gtg acg ggt atg act act gac aat ctt aaa tgc ccg tgc cag 15303 
His Tyr Val Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin 
860 865 870 875 

gtc cca teg ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc eta cat 15351 
Val Pro Ser Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His 
880 885 890 

agg ttt gcg ccc ccc tgc aag ccc ttg ctg egg gag gag gta tea ttc 15399 
Arg Phe Ala Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe 
895 900 905 

aga gta gga etc cac gaa tac ccg gta ggg teg caa tta cct tgc gag 15447 
Arg Val Gly Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu 
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910 915 . 920 

ccc gaa ccg gac gtg gcc gtg ttg acg tec atg etc act gat ccc tec 15495 
Pro Glu Pro Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser 
925 930 935 

cat ata aca gca gag gcg gcc ggg cga agg ttg gcg agg gga tea ccc 15543 
His lie Thr Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro 
940 945 950 955 

ccc tct gtg gcc age tec teg get age cag eta tec get cca tct etc 15591 
Pro Ser Val Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu 
960 965 970 

aag gca act tgc acc get aac cat gac tec cct gat get gag etc ata 1563 9 
Lys Ala Thr Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie 
975 980 985 

gag gcc aac etc eta tgg agg cag gag atg ggc ggc aac ate acc agg 15687 
Glu Ala Asn Leu Leu Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg 
990 995 1000 

gtt gag tea gaa aac aaa gtg gtg att ctg gac tec ttc gat ccg ctt 15735 
Val Glu Ser Glu Asn Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu 
1005 1010 1015 

gtg gcg gag gag gac gag egg gag ate tec gta ccc gca gaa ate ctg 15783 
Val Ala Glu Glu Asp Glu Arg Glu lie Ser Val Pro Ala Glu lie Leu 
1020 1025 1030 1035 

egg aag tct egg aga ttc gcc cag gcc ctg ccc gtt tgg gcg egg ccg 15 831 
Arg Lys Ser Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro 
1040 1045 1050 

gac tat aac ccc ccg eta gtg gag acg tgg aaa aag ccc gac tac gaa 15 879 
Asp Tyr Asn Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu 
1055 1060 1065 

cca cct gtg gtc cat ggc tgc ccg ctt cca cct cca aag tec cct cct 15927 
Pro Pro Val Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro 
1070 1075 1080 

gtg cct ccg cct egg aag aag egg acg gtg gtc etc act gaa tea acc 15 975 
Val Pro Pro Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr 
1085 1090 1095 

eta tct act gcc ttg gcc gag etc gcc acc aga age ttt ggc age tec 16023 
Leu Ser Thr Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser 
1100 1105 1110 1115 

tea act tec ggc att acg ggc gac aat acg aca aca tec tct gag ccc 16071 
Ser Thr Ser Gly lie Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro 
1120 1125 1130 

gcc cct tct ggc tgc ccc ccc gac tec gac get gag tec tat tec tec 16119 
Ala Pro Ser Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser 
1135 1140 1145 
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atg ccc ccc ctg gag ggg gag cct ggg gat ccg gat ctt age gac ggg 16167 
Met Pro Pro Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly 
1150 1155 1160 

tea tgg tea acg gtc agt agt gag gee aac gcg gag gat gtc gtg tgc 16215 
Ser Trp Ser Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys 
1165 1170 1175 

tgc tea atg tct tac tct tgg aca ggc gca etc gtc ace ccg tgc gee 16263 
Cys Ser Met Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala 
1180 1185 1190 1195 

gcg gaa gaa cag aaa ctg ccc ate aat gca eta age aac teg ttg eta 16311 
Ala Glu Glu Gin Lys Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu 
1200 1205 1210 

cgt cac cac aat ttg gtg tat tec acc ace tea cgc agt get tgc caa 16359 
Arg His His Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin 
1215 1220 1225 

agg cag aag aaa gtc aca ttt gac aga ctg caa gtt ctg gac age cat 16407 
Arg Gin Lys Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His 
1230 1235 1240 

tac cag gac gta etc aag gag gtt aaa gca gcg gcg tea aaa gtg aag 16455 
Tyr Gin Asp Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys 
1245 1250 1255 

get aac ttg eta tec gta gag gaa get tgc age ctg acg ccc cca cac 165 03 
Ala Asn Leu Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His 
1260 1265 1270 1275 

tea gee aaa tec aag ttt ggt tat ggg gca aaa gac gtc cgt tgc cat 16551 
Ser Ala Lys Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His 
1280 1285 1290 

gee aga aag gee gta acc cac ate aac tec gtg tgg aaa gac ctt ctg 16599 
Ala Arg Lys Ala Val Thr His lie Asn Ser Val Trp Lys Asp Leu Leu 
1295 1300 1305 

gaa gac aat gta aca cca ata gac act acc ate atg get aag aac gag 16647 
Glu Asp Asn Val Thr Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu 
1310 1315 1320 

gtt ttc tgc gtt cag cct gag aag ggg ggt cgt aag cca get cgt etc 16695 
Val Phe Cys Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu 
1325 1330 1335 

ate gtg ttc ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg get ttg 16743 
lie Val Phe Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu 
1340 1345 1350 1355 

tac gac gtg gtt aca aag etc ccc ttg gee gtg atg gga age tec tac 16791 
Tyr Asp Val Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr 
1360 1365 1370 

gga ttc caa tac tea cca gga cag egg gtt gaa ttc etc gtg caa gcg 16839 
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Gly Phe Gin Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala 
1375 1380 1385 

tgg aag tec aag aaa acc cca atg ggg ttc teg tat gat ace cgc tgc 16887 
Trp Lys Ser Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys 
1390 1395 1400 

ttt gac tec aca gtc act gag age gac ate cgt acg gag gag gca ate 16935 
Phe Asp Ser Thr Val Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie 
1405 1410 1415 

tac caa tgt tgt gac etc gac ccc caa gee cgc gtg gee ate aag tec 16983 
Tyr Gin Cys Cys Asp Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser 
1420 1425 1430 1435 

etc acc gag agg ctt tat gtt ggg ggc cct ctt acc aat tea agg ggg 17031 
Leu Thr Glu Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly 
1440 1445 1450 

gag aac tgc ggc tat cgc agg tgc cgc gcg age ggc gta ctg aca act 17079 
Glu Asn Cys Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr 
1455 1460 1465 

age tgt ggt aac acc etc act tgc tac ate aag gee egg gca gec tgt 17127 
Ser Cys Gly Asn Thr Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys 
1470 1475 1480 

cga gec gca ggg etc cag gac tgc acc atg etc gtg tgt ggc gac gac 17175 
Arg Ala Ala Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp 
1485 1490 1495 

tta gtc gtt ate tgt gaa age gcg ggg gtc cag gag gac gcg gcg age 1722 3 
Leu Val Val lie Cys Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser 
1500 1505 1510 1515 

ctg aga gec ttc acg gag get atg acc agg tac tec gee ccc cct ggg 17271 
Leu Arg Ala Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly 
1520 1525 1530 

gac ccc cca caa cca gaa tac gac ttg gag etc ata aca tea tgc tec 17319 
Asp Pro Pro Gin Pro Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser 
1535 1540 1545 

tec aac gtg tea gtc gec cac gac ggc get gga aag agg gtc tac tac 17367 
Ser Asn Val Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr 
1550 1555 1560 

etc acc cgt gac cct aca acc ccc etc gcg aga get gcg tgg gag aca 17415 
Leu Thr Arg Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr 

1565 1570 1575 

gca aga cac act cca gtc aat tec tgg eta ggc aac ata ate atg ttt 17463 
Ala Arg His Thr Pro Val Asn Ser Trp Leu Gly Asn lie lie Met Phe 
1580 1585 1590 1595 

gee ccc aca ctg tgg gcg agg atg ata ctg atg acc cat ttc ttt age 17511 
Ala Pro Thr Leu Trp Ala Arg Met lie Leu Met Thr His Phe Phe Ser 
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1600 1605 1610 

gtc ctt ata gcc agg gac cag ctt gaa cag gcc etc gat tgc gag ate 17559 
Val Leu lie Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie 
1615 1620 1625 

tac ggg gcc tgc tac tec ata gaa cca ctg gat eta cct cca ate att 17607 
Tyr Gly Ala Cys Tyr Ser lie Glu Pro Leu Asp Leu Pro Pro lie lie 
1630 1635 1640 

caa aga etc cat ggc etc age gca ttt tea etc cac agt tac tct cca 17655 
Gin Arg Leu His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro 
1645 1650 1655 

ggt gaa ate aat agg gtg gcc gca tgc etc aga aaa ctt ggg gta ccg 17703 
Gly Glu lie Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro 
1660 1665 1670 1675 

ccc ttg cga get tgg aga cac egg gcc egg age gtc cgc get agg ctt 17751 
Pro Leu Arg Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu 
1680 1685 1690 

ctg gcc aga gga ggc agg get gcc ata tgt ggc aag tac etc ttc aac 17799 
Leu Ala Arg Gly Gly Arg Ala Ala He Cys Gly Lys Tyr Leu Phe Asn 
1695 1700 1705 

fc 99 9 ca 9 ta a 9 a aca aa 9 ctc aaa ctc act cca ata 9 C 9 9 CC 9 ct ggc 17 847 
Trp Ala Val Arg Thr Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly 
1710 1715 1720 

cag ctg gac ttg tec ggc tgg ttc acg get ggc tac age ggg gga gac 17895 
Gin Leu Asp Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp 
1725 1730 1735 

att tat cac age gtg tct cat gcc egg ccc cgc tgg ate tgg ttt tgc 17943 
He Tyr His Ser Val Ser His Ala Arg Pro Arg Trp He Trp Phe Cys 
1740 1745 1750 1755 

eta ctc ctg ctt get gca ggg gta ggc ate tac ctc ctc ccc aac cga 17991 
Leu Leu Leu Leu Ala Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg 
1760 1765 1770 

atg age acg aat cct aaa cct caa aga aag ace aaa cgt aac ace aac 1803 9 
Met Ser Thr Asn Pro Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn 
1775 1780 1785 

c 99 egg ccg cag gac gtc aag ttc ccg ggt ggc ggt cag ate gtt ggt 18087 
Arg Arg Pro Gin Asp Val Lys Phe Pro Gly Gly Gly Gin He Val Gly 
1790 1795 1800 

gga gtt tac ttg ttg ccg cgc agg ggc cct aga ttg ggt gtg cgc gcg 18135 
Gly Val Tyr Leu Leu Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala 
1805 1810 1815 

acg aga aag act tec gag egg teg caa cct cga ggt aga cgt cag cct 18183 
Thr Arg Lys Thr Ser Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro 
1820 1825 1830 1835 
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ate ccc aag get cgt egg ccc gag ggc agg acc tgg get cag ccc ggg 18231 
lie Pro Lys Ala Arg Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly 
1840 1845 1850 

tac cct tgg ccc etc tat ggc aat gag ggc tgc ggg tgg gcg gga tgg 18279 
Tyr Pro Trp Pro Leu Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp 
1855 1860 1865 

etc ctg tct ccc cgt ggc tct egg cct age tgg ggc ccc aca gac ccc 18327 
Leu Leu Ser Pro Arg Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro 
1870 1875 1880 

c 99 cgt agg teg cgc aat ttg ggt aag gtc ate gat acc ctt acg tgc 183 75 
Arg Arg Arg Ser Arg Asn Leu Gly Lys Val lie Asp Thr Leu Thr Cys 
1885 1890 1895 

ggc ttc gec gac etc atg ggg tac ata ccg etc gtc ggc gec cct ctt 1842 3 
Gly Phe Ala Asp Leu Met Gly Tyr lie Pro Leu Val Gly Ala Pro Leu 
1900 1905 1910 1915 

gga ggc get gec agg gec taatagtcga ctttgttccc actgtacttt 18471 
Gly Gly Ala Ala Arg Ala 
1920 

tagctegtae aaaatacaat atacttttca tttctccgta aacaacatgt tttcccatgt 18531 
aatatccttt tctatttttc gttccgttac caactttaca catactttat atagctattc 18591 
acttctatac actaaaaaac taagacaatt ttaattttgc tgcctgccat atttcaattt 18651 
gttataaatt cctataattt atcctattag tagctaaaaa aagatgaatg tgaatcgaat 18711 
cctaagagaa ttggatctga tccacaggac gggtgtggtc gecatgateg cgtagtcgat 18771 
agtggctcca agtagcgaag cgagcaggac tgggeggegg ecaaageggt cggacagtgc 18 831 
tccgagaacg ggtgcgcata gaaattgeat caaegcatat agegctagea gcacgccata 18891 
gtgactggcg atgctgtcgg aatggacgat atcccgcaag aggcccggca gtaceggcat 18951 
aaccaagcct atgcctacag catccagggt gacggtgccg aggatgacga tgagegcatt 19011 
gttagatttc atacaeggtg cctgactgcg ttagcaattt aactgtgata aactaccgca 19071 
ttaaagcttt ttctttccaa tttttttttt ttegtcatta taaaaatcat tacgaccgag 19131 
attccegggt aataactgat ataattaaat tgaagctcta atttgtgagt ttagtataca 19191 
tgcatttact tataatacag ttttttagtt ttgctggccg catcttctca aatatgette 19251 
ccagcctgct tttctgtaac gttcaccctc taccttagca tcccttccct ttgeaaatag 19311 
tcctcttcca acaataataa tgtcagatcc tgtagagacc acatcatcca eggttctata 19371 
ctgttgaccc aatgegtetc ccttgtcatc taaacccaca ccgggtgtca taatcaacca 19431 
ategtaaect tcatctcttc cacccatgtc tctttgagca ataaagcega taacaaaatc 19491 
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tttgtcgctc ttcgcaatgt caacagtacc cttagtatat tctccagtag atagggagcc 19551 
cttgcatgac aattctgcta acatcaaaag gcctctaggt tcctttgtta cttcttctgc 19611 
cgcctgcttc aaaccgctaa caatacctgg gcccaccaca ccgtgtgcat tcgtaatgtc 19671 
tgcccattct gctattctgt atacacccgc agagtactgc aatttgactg tattaccaat 19731 
gtcagcaaat tttctgtctt cgaagagtaa aaaattgtac ttggcggata atgcctttag 197 91 
cggcttaact gtgccctcca tggaaaaatc agtcaagata tccacatgtg tttttagtaa 19851 
acaaattttg ggacctaatg cttcaactaa ctccagtaat tccttggtgg tacgaacatc 19911 
caatgaagca cacaagtttg tttgcttttc gtgcatgata ttaaatagct tggcagcaac 19971 
aggactagga tgagtagcag cacgttcctt atatgtagct ttcgacatga tttatcttcg 2 0031 
tttcctgcag gtttttgttc tgtgcagttg ggttaagaat actgggcaat ttcatgtttc 200 91 
ttcaacacta catatgcgta tatataccaa tctaagtctg tgctccttcc ttcgttcttc 20151 
cttctgttcg gagattaccg aatcaaaaaa atttcaagga aaccgaaatc aaaaaaaaga 20211 
ataaaaaaaa aatgatgaat tgaaaagctt atcgat 20247 



<210> 19 
<211> 1921 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd. delta. NS3NS5.pj .corelSO 

<400> 19 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
15 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 

20 25 30 

Gly He Asp Pro Asn He Arg Thr Gly Val Arg Thr He Thr Thr Gly 
35 40 45 

Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp He He He Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp Gin Ala 
85 90 95 

Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 
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Gly Ser Val Thr Val Pro His Pro Asn lie Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu lie Pro Phe Tyr Gly Lys Ala lie Pro Leu Glu Val 
130 135 140 

lie Lys Gly Gly Arg His Leu lie Phe Cys His Ser Lys Lys Lys Cys 
145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly lie Asn Ala Val Ala 
165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val lie Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val lie Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr lie Glu Thr lie Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 
245 250 255 

Gly lie Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 
260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 

Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu He Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He Thr Leu Thr His Pro 
385 390 395 400 

Val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 
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Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val lie Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala lie He Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 525 

He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He Leu Gly Gly Trp Val 
565 570 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 
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His Gin Trp lie Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp lie Trp Asp Trp lie Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly lie Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly lie 
770 775 780 

Met His Thr Arg Cys His Cys Gly Ala Glu lie Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg lie Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro lie Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 

Glu Tyr Val Glu lie Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 

Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His lie Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 
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Glu Arg Glu He Ser Val Pro Ala Glu He Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly He 
105 1110 1115 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 

Leu Pro He Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 

Thr His He Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro He Asp Thr Thr He Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 
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Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu lie Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val He Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu He Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn He He Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 

Ala Arg Met He Leu Met Thr His Phe Phe Ser Val Leu He Ala Arg 
1605 1610 1615 

Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He Tyr Gly Ala Cys Tyr 



1620 



1625 



1630 
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Ser He Glu Pro Leu Asp Leu Pro Pro He He Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu He Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 1680 

Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 

Arg Ala Ala He Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp He Tyr His Ser Val 
1730 1735 1740 

Ser His Ala Arg Pro Arg Trp He Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 

Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg Met Ser Thr Asn Pro 
1765 1770 1775 

Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn Arg Arg Pro Gin Asp 
1780 1785 1790 

Val Lys Phe Pro Gly Gly Gly Gin He Val Gly Gly Val Tyr Leu Leu 
1795 1800 1805 

Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala Thr Arg Lys Thr Ser 
1810 1815 1820 

Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro He Pro Lys Ala Arg 
825 1830 1835 1840 

Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly Tyr Pro Trp Pro Leu 
1845 1850 1855 

Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp Leu Leu Ser Pro Arg 
I860 1865 1870 

Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro Arg Arg Arg Ser Arg 
1875 1880 1885 

Asn Leu Gly Lys Val He Asp Thr Leu Thr Cys Gly Phe Ala Asp Leu 
1890 1895 1900 

Met Gly Tyr He Pro Leu Val Gly Ala Pro Leu Gly Gly Ala Ala Arg 
9 °5 1910 1915 1920 

Ala 
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